My Generative AI & LLM Learning Journey

I’m expanding my career from Senior Data Engineer to LLM / Generative AI Specialist. This page documents my journey, broken into weekly milestones — each link goes deeper into my hands-on work, code, and reflections..

Phase 1: Core AI & LLM Foundations (Months 1–2)

Week 1 → [Data Loading, Tokenization & Embeddings Playground] Build the foundation by preparing text datasets, applying tokenization, and setting up embeddings. Then trained word embeddings (Word2Vec-style) and visualize how words cluster semantically in vector space
Week 2 → [Attention Mechanism from Scratch] Implement scaled dot-product attention (Queries, Keys, Values). Visualize attention heatmaps.
Week 3 → [Multi-Head Attention & Mini Transformer Block] Extend attention into multi-heads, added feed-forward networks and layer normalization.
Week 4 → [Mini-GPT: Character-Level Language Model] Build a decoder-only Transformer, trained it on a small dataset, and generate Shakespeare-style text.

Phase 2: RAG & Applied LLMs (Months 3–4)

[Company Knowledge Bot with RAG] Connect an LLM to external documents using embeddings + vector search (FAISS/Pinecone).
[Semantic Search Demo] Build semantic search over custom text collections.

Phase 3: Agents & Tool Use (Months 5–6)

[Multi-Agent Research Assistant] Build LLM agents that plan, retrieve, and summarize collaboratively.

Phase 4: Fine-Tuning & Deployment (Months 7–10)

[Fine-Tuned Domain LLM] Customize a small open-source model for domain-specific tasks (classification + Q&A).
[Deployed RAG Chatbot] Package a chatbot with FastAPI + Docker, deployed to the cloud.

Phase 5: Deployment & Scaling (Months 11-12)

[Deploying Models on AWS , Azure, GCP + Scaling Inference] Containerize Hugging Face models, deploy via AWS SageMaker / Lambda & Azure ML / Container Apps. Explore autoscaling + cost optimization.
[TensorRT for Optimized Inference] Convert a Hugging Face model (e.g., DistilBERT or GPT-2 small) into ONNX → TensorRT engine. Run benchmarks comparing PyTorch vs ONNX vs TensorRT inference latency. Deploy a TensorRT-optimized model in Docker on cloud GPU (AWS EC2 G4/G5, Azure NV-series, or GCP GPU).. —

Phase 6: Branding & Consulting (Months 12–15)

[Portfolio Wrap-Up] Compile projects into case studies, document consulting-ready AI solutions, and published learnings.