I’m expanding my career from Senior Data Engineer to LLM / Generative AI Specialist. This page documents my journey, broken into weekly milestones — each link goes deeper into my hands-on work, code, and reflections..
Phase 1: Core AI & LLM Foundations (Months 1–2)
- Week 1 → [Data Loading, Tokenization & Embeddings Playground] Build the foundation by preparing text datasets, applying tokenization, and setting up embeddings. Then trained word embeddings (Word2Vec-style) and visualize how words cluster semantically in vector space
- Week 2 → [Attention Mechanism from Scratch] Implement scaled dot-product attention (Queries, Keys, Values). Visualize attention heatmaps.
- Week 3 → [Multi-Head Attention & Mini Transformer Block] Extend attention into multi-heads, added feed-forward networks and layer normalization.
- Week 4 → [Mini-GPT: Character-Level Language Model] Build a decoder-only Transformer, trained it on a small dataset, and generate Shakespeare-style text.
Phase 2: RAG & Applied LLMs (Months 3–4)
- [Company Knowledge Bot with RAG] Connect an LLM to external documents using embeddings + vector search (FAISS/Pinecone).
- [Semantic Search Demo] Build semantic search over custom text collections.
Phase 3: Agents & Tool Use (Months 5–6)
- [Multi-Agent Research Assistant] Build LLM agents that plan, retrieve, and summarize collaboratively.
Phase 4: Fine-Tuning & Deployment (Months 7–10)
- [Fine-Tuned Domain LLM] Customize a small open-source model for domain-specific tasks (classification + Q&A).
- [Deployed RAG Chatbot] Package a chatbot with FastAPI + Docker, deployed to the cloud.
Phase 5: Deployment & Scaling (Months 11-12)
- [Deploying Models on AWS , Azure, GCP + Scaling Inference] Containerize Hugging Face models, deploy via AWS SageMaker / Lambda & Azure ML / Container Apps. Explore autoscaling + cost optimization.
- [TensorRT for Optimized Inference] Convert a Hugging Face model (e.g., DistilBERT or GPT-2 small) into ONNX → TensorRT engine. Run benchmarks comparing PyTorch vs ONNX vs TensorRT inference latency. Deploy a TensorRT-optimized model in Docker on cloud GPU (AWS EC2 G4/G5, Azure NV-series, or GCP GPU).. —
Phase 6: Branding & Consulting (Months 12–15)
- [Portfolio Wrap-Up] Compile projects into case studies, document consulting-ready AI solutions, and published learnings.