My Generative AI & LLM Learning Journey

I’m expanding my career from Senior Data Engineer to LLM / Generative AI Specialist. This page documents my journey, broken into weekly milestones — each link goes deeper into my hands-on work, code, and reflections..


Phase 1: Core AI & LLM Foundations (Months 1–2)

  • Week 1 → [Data Loading, Tokenization & Embeddings Playground] Build the foundation by preparing text datasets, applying tokenization, and setting up embeddings. Then trained word embeddings (Word2Vec-style) and visualize how words cluster semantically in vector space
  • Week 2 → [Attention Mechanism from Scratch] Implement scaled dot-product attention (Queries, Keys, Values). Visualize attention heatmaps.
  • Week 3 → [Multi-Head Attention & Mini Transformer Block] Extend attention into multi-heads, added feed-forward networks and layer normalization.
  • Week 4 → [Mini-GPT: Character-Level Language Model] Build a decoder-only Transformer, trained it on a small dataset, and generate Shakespeare-style text.

Phase 2: RAG & Applied LLMs (Months 3–4)

  • [Company Knowledge Bot with RAG] Connect an LLM to external documents using embeddings + vector search (FAISS/Pinecone).
  • [Semantic Search Demo] Build semantic search over custom text collections.

Phase 3: Agents & Tool Use (Months 5–6)

  • [Multi-Agent Research Assistant] Build LLM agents that plan, retrieve, and summarize collaboratively.

Phase 4: Fine-Tuning & Deployment (Months 7–10)

  • [Fine-Tuned Domain LLM] Customize a small open-source model for domain-specific tasks (classification + Q&A).
  • [Deployed RAG Chatbot] Package a chatbot with FastAPI + Docker, deployed to the cloud.

Phase 5: Deployment & Scaling (Months 11-12)

  • [Deploying Models on AWS , Azure, GCP + Scaling Inference] Containerize Hugging Face models, deploy via AWS SageMaker / Lambda & Azure ML / Container Apps. Explore autoscaling + cost optimization.
  • [TensorRT for Optimized Inference] Convert a Hugging Face model (e.g., DistilBERT or GPT-2 small) into ONNX → TensorRT engine. Run benchmarks comparing PyTorch vs ONNX vs TensorRT inference latency. Deploy a TensorRT-optimized model in Docker on cloud GPU (AWS EC2 G4/G5, Azure NV-series, or GCP GPU).. —

Phase 6: Branding & Consulting (Months 12–15)

  • [Portfolio Wrap-Up] Compile projects into case studies, document consulting-ready AI solutions, and published learnings.

Related Projects & Learning

LLM Tokenization

When you type a sentence into ChatGPT, it doesn't see the sentence as you do. Instead, it breaks the text down into smaller pieces called tokens. Tokens are like the building blocks of language for AI models. The process of breaking down these sentences into tokens is called tokenization.

View Project