Available for Projects

Hi, I'm Kaustav
Data Engineer

Senior Data Engineer with 5+ years of experience building scalable ETL pipelines, real-time data platforms, and cloud-native solutions on AWS. Passionate about turning complex data challenges into elegant solutions.

5+
Years Experience
20+
Projects Delivered
10M+
Records/Day
data_pipeline.py
1from pyspark.sql import SparkSession
2from awsglue.context import GlueContext
3
4class ETLPipeline:
5 def __init__(self):
6 self.spark = SparkSession.builder \
7 .appName("DataPipeline") \
8 .getOrCreate()
9
10 def process(self, source):
11 # Transform millions of records
12 return self.transform(source)
Expertise

What I Do Best

Specialized in building robust, scalable data infrastructure on cloud platforms

AWS Data Engineering

Building serverless data lakes and ETL pipelines using Glue, Lambda, S3, and EMR.

Glue Lambda S3 EMR

Apache Spark

Processing millions of records with PySpark, optimizing jobs for performance.

PySpark Spark SQL DataFrames

Python Development

5+ years of Python for data processing, automation, and backend services.

Pandas Boto3 Django

Real-time Processing

Event-driven architectures with Kafka, Kinesis, and streaming pipelines.

Kafka Kinesis SQS
Latest

Recent Blog Posts

Insights and tutorials on data engineering, AWS, and cloud architecture

Let's Build Something Amazing

Have a data engineering challenge? I'd love to help you architect and build scalable solutions that drive real business value.