ai-learning-repo

This repo contains code related to my learnings in GenAI. I've following these tasks and I will be updating each task in a python file

Tasks

Code. Plot. Break. Repeat.

Tokenization & Embeddings

Build a byte-pair encoder (BPE) and train your own subword vocab
Write a token visualizer to map words/chunks → token IDs
Compare one-hot vs learned embeddings, and
- Plot cosine distances between token vectors

Positional Embeddings

Self-Attention & Multihead Attention

Hand-wire dot-product attention for one token
Scale to multi-head attention, plot per-head weight heatmaps
Mask out future tokens, verify causal property

Transformers: Q, K, V & Stacking

Stack your Attention + LayerNorm + Residual → single-block transformer
Generalize to n-block “mini-former” on toy data
Dissect Q, K, V — swap them, break them, see what explodes

Sampling Parameters: Temperature / Top-k / Top-p

Build a sampler dashboard — interactively tune temp/k/p
Plot entropy vs output diversity
Set temp = 0 (argmax) — watch repetition set in

KV Cache (Fast Inference)

Record & reuse KV states; measure speedup vs no-cache
Build a cache hit/miss visualizer for token streams
Profile cache memory cost for long vs short sequences

Long-Context Tricks: Infini-Attention / Sliding Window

Implement sliding-window attention; measure loss on long docs
Benchmark memory-efficient attention (recompute, flash)
Plot perplexity vs context length, find the context collapse point

Mixture of Experts (MoE)

Code a 2-expert router layer; route tokens dynamically
Plot expert utilization histograms
Simulate sparse vs dense routing, measure FLOP savings

Grouped Query Attention

Convert your mini-former to grouped query layout
Measure speed vs vanilla multi-head on large batch
Ablate number of groups, plot latency

Normalization & Activations

Pretraining Objectives

Finetuning vs Instruction Tuning vs RLHF

Scaling Laws & Model Capacity

Train tiny / small / medium models
Plot loss vs size
Benchmark time, VRAM, throughput
Extrapolate scaling curve — how “dumb” can you go?

Quantization

Implement PTQ (Post-Training Quantization)
Implement QAT (Quantization-Aware Training)
Export to GGUF / AWQ
Plot accuracy drop vs compression ratio

Inference / Training Stacks

Port a model from HuggingFace → Deepspeed → vLLM → ExLlama
Profile throughput, VRAM, latency across all three

Synthetic Data

Generate toy data, add noise, dedupe, create eval splits
Visualize model learning curves on real vs synthetic data

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-learning-repo

Tasks

Tokenization & Embeddings

Positional Embeddings

Self-Attention & Multihead Attention

Transformers: Q, K, V & Stacking

Sampling Parameters: Temperature / Top-k / Top-p

KV Cache (Fast Inference)

Long-Context Tricks: Infini-Attention / Sliding Window

Mixture of Experts (MoE)

Grouped Query Attention

Normalization & Activations

Pretraining Objectives

Finetuning vs Instruction Tuning vs RLHF

Scaling Laws & Model Capacity

Quantization

Inference / Training Stacks

Synthetic Data

About

Uh oh!

Releases

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ai-learning-repo

Tasks

Tokenization & Embeddings

Positional Embeddings

Self-Attention & Multihead Attention

Transformers: Q, K, V & Stacking

Sampling Parameters: Temperature / Top-k / Top-p

KV Cache (Fast Inference)

Long-Context Tricks: Infini-Attention / Sliding Window

Mixture of Experts (MoE)

Grouped Query Attention

Normalization & Activations

Pretraining Objectives

Finetuning vs Instruction Tuning vs RLHF

Scaling Laws & Model Capacity

Quantization

Inference / Training Stacks

Synthetic Data

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!