Data Engineering

Oracle to PostgreSQL Migration

Architected a high-performance migration pipeline moving ~1 billion records per load from an Oracle Data Warehouse to Aurora PostgreSQL using parallel DB-Link async sessions, partitioned tables, and S3-based hybrid-cloud ingestion.

Oracle Aurora PostgreSQL S3 Storage Gateway PostgreSQL Aurora Amazon S3 dblink aws_s3 Table Partitioning PL/pgSQL Python

Architecture

Migration Architecture

Oracle Warehouse → NAS → Storage Gateway → S3 → Aurora PostgreSQL (Partitioned, Parallel DB-Link Loading)

Oracle Warehouse Source System

NAS Storage On-Prem

Storage Gateway AWS File Gateway

Amazon S3 Data Files

~1 Billion Records / Load

Aurora PostgreSQL

Staging Tables

Delete

DEL

Insert

INS

aws_s3 Extension

DB-Link Sessions

×N async sessions

dblink_send_query() — Partition-by-Partition

Main Tables (Partitioned)

Partition 1

Partition 2

Partition 3

Partition 4

~1B Records

Execution Tracking

migration_logs Status & Metrics

dblink_get_result() → collect async results → INSERT INTO logs

Oracle Source

NAS / On-Prem

AWS Gateway

S3 Storage

Delete Partition

Insert Partition

DB-Link Sessions

Main Partition

Execution Logs

Data Flow

How It Works

End-to-end Oracle to PostgreSQL migration with parallel DB-Link loading

Oracle Data Export

The Oracle Data Warehouse exports data files to a NAS (Network Attached Storage) location. This NAS acts as the bridge between the on-premises environment and the cloud.

Oracle Warehouse NAS Storage

Cloud Transfer via Storage Gateway

AWS Storage Gateway connects to the NAS location and automatically syncs files to Amazon S3. This provides a seamless hybrid-cloud bridge without changes to existing export processes.

AWS Storage Gateway Amazon S3

S3 Import into Staging Partitions

Using the PostgreSQL aws_s3 extension, data is imported directly from S3 into the staging area. Records are split into two partition types: Delete Partitions (P1–P4) containing records to remove from main tables, and Insert Partitions (P1–P4) containing records to add. Each partition aligns with the corresponding main table partition.

aws_s3 Extension Delete Partitions Insert Partitions P1–P4

Partition-by-Partition DB-Link Routing

Each DB-Link session handles one partition. Delete Partition P1 routes through dblink_send_query() to delete matching records from Main Table Partition P1. Insert Partition P1 similarly inserts new records into Main Table P1. This partition-by-partition routing enables true parallelism — multiple DB-Link sessions run concurrently, each operating on its own partition without contention.

dblink_send_query() Partition-by-Partition DELETE then INSERT ×N Async Sessions

Collect Results & Wait

After all async queries are dispatched, the orchestrator waits for each DB-Link to complete via dblink_get_result(). This ensures all parallel sessions finish before proceeding, providing a synchronization barrier.

dblink_get_result() Barrier Sync Error Handling

Logging & Audit Trail

Once all async operations complete, the results (row counts, execution time, status, errors) are collected and inserted into the migration_logs table. This provides full auditability and performance tracking for every load cycle.

migration_logs Row Counts Execution Metrics Error Tracking

Features

Key Features

Partition-by-Partition Routing

Each DB-Link session handles exactly one partition — Delete P1 routes to Main P1, Insert P1 routes to Main P1 — enabling lock-free parallel processing.

Delete & Insert Partitions

Staging data is split into Delete and Insert partitions (P1–P4). Deletes remove stale records; inserts add new — all routed partition-by-partition through DB-Link.

Hybrid Cloud Bridge

AWS Storage Gateway connects on-premises NAS seamlessly to S3, requiring zero changes to existing Oracle export workflows.

Direct S3 Import

The aws_s3 PostgreSQL extension imports data directly from S3 into staging tables, bypassing intermediate ETL tools.

Full Audit Trail

Every load cycle records row counts, execution times, and error details in the migration_logs table for complete traceability.

Billion-Row Scale

Designed to handle ~1 billion record changes per load cycle across multiple tables with sustained throughput.

1B+

Records / Load

×N

Parallel Sessions

Partitioned Tables

100%

Audit Coverage

Interested in Database Migration?

Let's discuss how I can help architect your Oracle to PostgreSQL migration with high-performance parallel loading.

Get in Touch