Data Engineering

Oracle to PostgreSQL Migration

Architected a high-performance migration pipeline moving ~1 billion records per load from an Oracle Data Warehouse to Aurora PostgreSQL using parallel DB-Link async sessions, partitioned tables, and S3-based hybrid-cloud ingestion.

Oracle Aurora PostgreSQL S3 Storage Gateway PostgreSQL Aurora Amazon S3 dblink aws_s3 Table Partitioning PL/pgSQL Python
Architecture

Migration Architecture

Oracle Warehouse → NAS → Storage Gateway → S3 → Aurora PostgreSQL (Partitioned, Parallel DB-Link Loading)

Oracle
Oracle Warehouse Source System
NAS Storage On-Prem
Storage Gateway
Storage Gateway AWS File Gateway
S3
Amazon S3 Data Files
~1 Billion Records / Load
Aurora Aurora PostgreSQL
Staging Tables
Delete
P1
DEL
P2
DEL
P3
DEL
P4
DEL
Insert
P1
INS
P2
INS
P3
INS
P4
INS
aws_s3 Extension
Main Tables (Partitioned)
Aurora
P1
Partition 1
P2
Partition 2
P3
Partition 3
P4
Partition 4
~1B Records
Execution Tracking
migration_logs Status & Metrics
dblink_get_result() → collect async results → INSERT INTO logs
Oracle Source
NAS / On-Prem
AWS Gateway
S3 Storage
Delete Partition
Insert Partition
DB-Link Sessions
Main Partition
Execution Logs
Data Flow

How It Works

End-to-end Oracle to PostgreSQL migration with parallel DB-Link loading

1

Oracle Data Export

The Oracle Data Warehouse exports data files to a NAS (Network Attached Storage) location. This NAS acts as the bridge between the on-premises environment and the cloud.

Oracle Warehouse NAS Storage
2

Cloud Transfer via Storage Gateway

AWS Storage Gateway connects to the NAS location and automatically syncs files to Amazon S3. This provides a seamless hybrid-cloud bridge without changes to existing export processes.

AWS Storage Gateway Amazon S3
3

S3 Import into Staging Partitions

Using the PostgreSQL aws_s3 extension, data is imported directly from S3 into the staging area. Records are split into two partition types: Delete Partitions (P1–P4) containing records to remove from main tables, and Insert Partitions (P1–P4) containing records to add. Each partition aligns with the corresponding main table partition.

aws_s3 Extension Delete Partitions Insert Partitions P1–P4
4

Partition-by-Partition DB-Link Routing

Each DB-Link session handles one partition. Delete Partition P1 routes through dblink_send_query() to delete matching records from Main Table Partition P1. Insert Partition P1 similarly inserts new records into Main Table P1. This partition-by-partition routing enables true parallelism — multiple DB-Link sessions run concurrently, each operating on its own partition without contention.

dblink_send_query() Partition-by-Partition DELETE then INSERT ×N Async Sessions
5

Collect Results & Wait

After all async queries are dispatched, the orchestrator waits for each DB-Link to complete via dblink_get_result(). This ensures all parallel sessions finish before proceeding, providing a synchronization barrier.

dblink_get_result() Barrier Sync Error Handling
6

Logging & Audit Trail

Once all async operations complete, the results (row counts, execution time, status, errors) are collected and inserted into the migration_logs table. This provides full auditability and performance tracking for every load cycle.

migration_logs Row Counts Execution Metrics Error Tracking
Features

Key Features

Partition-by-Partition Routing

Each DB-Link session handles exactly one partition — Delete P1 routes to Main P1, Insert P1 routes to Main P1 — enabling lock-free parallel processing.

Delete & Insert Partitions

Staging data is split into Delete and Insert partitions (P1–P4). Deletes remove stale records; inserts add new — all routed partition-by-partition through DB-Link.

Hybrid Cloud Bridge

AWS Storage Gateway connects on-premises NAS seamlessly to S3, requiring zero changes to existing Oracle export workflows.

Direct S3 Import

The aws_s3 PostgreSQL extension imports data directly from S3 into staging tables, bypassing intermediate ETL tools.

Full Audit Trail

Every load cycle records row counts, execution times, and error details in the migration_logs table for complete traceability.

Billion-Row Scale

Designed to handle ~1 billion record changes per load cycle across multiple tables with sustained throughput.

1B+
Records / Load
×N
Parallel Sessions
3+
Partitioned Tables
100%
Audit Coverage

Interested in Database Migration?

Let's discuss how I can help architect your Oracle to PostgreSQL migration with high-performance parallel loading.

Get in Touch