Kaustav Maity

Impact-driven Senior Data Engineer with 5+ years of Python experience and 4 years in cloud data engineering. Specialized in designing scalable batch and real-time data platforms on AWS.

Kaustav Maity

Senior Data Engineer

Deloitte | Kolkata, India

AWS Certified Databricks Certified
5+
Years Python
4
Years Cloud
30+
Pipelines

Profile

Impact-driven Senior Data Engineer with 5+ years of Python experience and 4 years in cloud data engineering. Specialized in designing scalable batch and real-time data platforms on AWS using Apache Spark, Kafka (Amazon Kinesis), SQL, and Apache Iceberg. Proven track record of leading cloud migrations, improving pipeline reliability, reducing processing latency, and enabling real-time analytics through strong system design and observability practices.

Technical Skills

Programming Languages
Python SQL Scala
Big Data Technologies
Apache Spark PySpark Spark SQL Hadoop Hive Apache Hudi Apache Iceberg
Streaming Platforms
Apache Kafka Amazon Kinesis
AWS Cloud Services
S3 Glue Lambda DMS EMR Aurora PostgreSQL DynamoDB RDS Step Functions SQS SNS EventBridge CloudWatch
Azure Cloud Services
Azure Data Factory Azure Databricks Azure Synapse ADLS Gen2
Core Concepts & Tools
ETL/ELT Pipelines Data Modeling Data Lakes System Design Distributed Systems Observability Apache Airflow Python Flask

Experience

Senior Data Engineer

Deloitte
May 2025 - Present

Kolkata, India

  • Architected and delivered the end-to-end system design for an on-premises to AWS migration, moving critical workloads from Oracle to Amazon Aurora PostgreSQL, improving scalability and long-term maintainability.
  • Executed large-scale database migration using AWS DMS and AWS File Gateway, ensuring zero data loss while handling high-volume historical and incremental data loads.
  • Designed and implemented optimized ETL/ELT pipelines using AWS Glue and Amazon S3, reducing batch load times and improving pipeline reliability.
  • Developed a custom observability dashboard using Python Flask, providing real-time visibility into data load metrics, failures, latency, and SLA adherence.
  • Implemented parallel and asynchronous data loading using PostgreSQL dblink and aws_s3/aws_lambda extensions, increasing ingestion throughput and reducing processing latency.

Data Engineer

Tata Consultancy Services
May 2022 - April 2025

Kolkata, India

  • Owned and operated 30+ batch, streaming, and near real-time data pipelines processing millions of transactional records daily for a UK-based insurance client.
  • Designed and implemented real-time streaming pipelines using Apache Kafka (via Amazon Kinesis) to ingest data into an Apache Iceberg–based data lake, enabling real-time dashboard updates and faster business insights.
  • Optimized large-scale Spark workloads using caching, multithreading, and broadcast joins, achieving a 20% reduction in processing time for high-volume datasets.
  • Led AWS migration initiatives and mentored three junior data engineers, improving delivery timelines, code quality, and production stability.

Subject Matter Expert

Chegg India / Course Hero / Bartleby
Aug 2020 - April 2022

Remote (Freelance)

  • Delivered expert-level academic and technical solutions in Python, Java, Statistics, and Data Analytics to a global learner base.

Education

M.Sc. in Data Analytics
Ramakrishna Mission Vivekananda Educational and Research Institute

Howrah, West Bengal

2019 - 2021
B.Sc. in Statistics
Ramakrishna Mission Residential College, Narendrapur

Kolkata, West Bengal

2016 - 2019

Let's Connect!

Interested in discussing data engineering challenges or opportunities?