Featured Projects

Showcasing data engineering solutions and AI-powered analytics

Real-Time Data Pipeline

PySpark AWS Apache Iceberg

Built a scalable real-time data pipeline processing millions of events daily, leveraging Apache Iceberg for efficient data lakehouse operations.

  • Reduced data processing latency by 60%
  • Implemented incremental processing with SCD Type 2
  • Automated data quality checks and reconciliation

LLM-Powered Analytics Platform

Python RAG Agentic AI

Developed an AI-powered analytics platform using RAG architecture to enable natural language querying of business data.

  • Integrated LLM with enterprise data sources
  • Implemented semantic search and context retrieval
  • Reduced time-to-insight by 75%

Data Governance Framework

dbt SQL Palantir Foundry

Established comprehensive data governance framework ensuring data quality, lineage tracking, and compliance across the organization.

  • Implemented automated data quality monitoring
  • Created end-to-end data lineage visualization
  • Achieved 99.9% data accuracy SLA

Cloud Migration & Optimization

AWS Docker Terraform

Led migration of legacy on-premise data infrastructure to AWS cloud, optimizing for cost and performance.

  • Reduced infrastructure costs by 40%
  • Improved system reliability to 99.95% uptime
  • Containerized all data processing workloads

More projects coming soon. Check out my GitHub for code samples.