danielsobrado
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎blog/content/_index.md‎
Lines changed: 72 additions & 0 deletions b/‎blog/content/_index.md‎
Lines changed: 72 additions & 0 deletions
diff --git a/‎blog/content/about.md‎
Lines changed: 33 additions & 0 deletions b/‎blog/content/about.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎blog/content/posts/attention-mechanism-part1.md‎
Lines changed: 94 additions & 0 deletions b/‎blog/content/posts/attention-mechanism-part1.md‎
Lines changed: 94 additions & 0 deletions
diff --git a/‎blog/content/posts/bag-of-words.md‎
Lines changed: 176 additions & 0 deletions b/‎blog/content/posts/bag-of-words.md‎
Lines changed: 176 additions & 0 deletions
@@ -5,3 +5,4 @@ conv-relu-animation/node_modules/
 conv2d-animation/node_modules/
 node_modules/
 .history/*
+mini-diffusion/target/
@@ -0,0 +1,72 @@
+---
+title: "Welcome"
+---
+
+# ML Animations Blog
+
+Interactive explanations of machine learning concepts. Each article pairs with a [visualization](https://danielsobrado.github.io/ml-animations/).
+
+## Recent Posts
+
+Browse by category:
+
+### Transformers
+- [Attention Mechanism](/posts/attention-mechanism-part1/)
+- [Self-Attention](/posts/self-attention/)
+- [Positional Encoding](/posts/positional-encoding/)
+- [Transformer Architecture](/posts/transformer-architecture/)
+- [BERT](/posts/bert/)
+
+### NLP Fundamentals
+- [Word2Vec](/posts/word2vec/)
+- [GloVe](/posts/glove/)
+- [FastText](/posts/fasttext/)
+- [Embeddings](/posts/embeddings/)
+- [Tokenization](/posts/tokenization/)
+- [Bag of Words](/posts/bag-of-words/)
+
+### Neural Networks
+- [ReLU](/posts/relu/)
+- [Leaky ReLU](/posts/leaky-relu/)
+- [Softmax](/posts/softmax/)
+- [Layer Normalization](/posts/layer-normalization/)
+- [LSTM](/posts/lstm/)
+- [Conv2D](/posts/conv2d/)
+- [Conv + ReLU](/posts/conv-relu/)
+
+### Advanced Models
+- [Fine-Tuning](/posts/fine-tuning/)
+- [RAG](/posts/rag/)
+- [VAE](/posts/vae/)
+- [Multimodal LLM](/posts/multimodal-llm/)
+
+### Math Fundamentals
+- [Gradient Descent](/posts/gradient-descent/)
+- [Linear Regression](/posts/linear-regression/)
+- [Matrix Multiplication](/posts/matrix-multiplication/)
+- [Eigenvalues](/posts/eigenvalue/)
+- [SVD](/posts/svd/)
+- [QR Decomposition](/posts/qr-decomposition/)
+
+### Probability & Statistics
+- [Probability Distributions](/posts/probability-distributions/)
+- [Conditional Probability](/posts/conditional-probability/)
+- [Expected Value & Variance](/posts/expected-value-variance/)
+- [Entropy](/posts/entropy/)
+- [Cross-Entropy](/posts/cross-entropy/)
+- [Cosine Similarity](/posts/cosine-similarity/)
+- [Spearman Correlation](/posts/spearman-correlation/)
+
+### Reinforcement Learning
+- [RL Foundations](/posts/rl-foundations/)
+- [Q-Learning](/posts/q-learning/)
+- [Exploration](/posts/rl-exploration/)
+- [Markov Chains](/posts/markov-chains/)
+
+### Algorithms
+- [Bloom Filter](/posts/bloom-filter/)
+- [PageRank](/posts/pagerank/)
+
+---
+
+[View all animations →](https://danielsobrado.github.io/ml-animations/)
@@ -0,0 +1,33 @@
+---
+title: "About ML Animations"
+---
+
+This blog accompanies the [ML Animations](https://danielsobrado.github.io/ml-animations/) project - a collection of interactive visualizations explaining machine learning concepts.
+
+## What you'll find here
+
+Each article explains a concept from the animations in more depth:
+
+- **Transformers & Attention**: How modern language models work
+- **NLP Fundamentals**: Word2Vec, embeddings, tokenization
+- **Neural Networks**: Activations, normalization, architectures
+- **Math Foundations**: Linear algebra, probability, optimization
+- **Reinforcement Learning**: Q-learning, exploration, MDPs
+- **Algorithms**: PageRank, Bloom filters
+
+## Why visualizations?
+
+ML concepts click better when you see them. A picture of gradient descent navigating a loss surface beats equations. Watching attention weights form makes transformers less magical.
+
+The animations are interactive - play with parameters, see what changes.
+
+## About the writing
+
+These articles try to explain things like a colleague would over coffee. Not academic papers. Occasional shortcuts and simplifications where they help understanding.
+
+If something's unclear or wrong, open an issue.
+
+## Links
+
+- [Interactive Animations](https://danielsobrado.github.io/ml-animations/)
+- [GitHub Repository](https://github.com/danielsobrado/ml-animations)
@@ -0,0 +1,94 @@
+---
+title: "What is Attention? finally understood it"
+date: 2024-11-28
+draft: false
+tags: ["transformers", "attention", "nlp", "deep-learning"]
+categories: ["Machine Learning"]
+series: ["Understanding Attention"]
+---
+
+So you keep hearing about attention mechanism everywhere. Transformers this, attention that. I spent weeks trying to understand it from papers and tutorials. Most explanations made it way more complicated than needed.
+
+Let me try to explain how I finally got it.
+
+## The database analogy that clicked for me
+
+Think of attention like a fuzzy database lookup. Not a perfect match, but weighted combinations.
+
+You have three things:
+- Query (Q) - what you're searching for
+- Key (K) - labels or titles of items  
+- Value (V) - the actual content
+
+Unlike normal database that returns exact match, attention returns weighted combination of ALL values. The weights depend on how well query matches each key.
+
+![Attention Mechanism Interactive Demo](https://danielsobrado.github.io/ml-animations/animation/attention-mechanism)
+
+Check out the interactive visualization I built: [Attention Mechanism Animation](https://danielsobrado.github.io/ml-animations/animation/attention-mechanism)
+
+## Library search example
+
+ok so imagine walking into library looking for books about "machine learning"
+
+Your query is "machine learning"
+
+The keys are book titles:
+- Neural Networks
+- Python Basics  
+- Deep Learning
+- Cooking Recipes
+- AI Fundamentals
+- Romance Novels
+
+Values are the actual book contents.
+
+Now attention doesn't just grab one book. It looks at ALL books and weights them by relevance:
+- Deep Learning: high weight (very relevant)
+- Neural Networks: high weight
+- AI Fundamentals: medium-high weight
+- Python Basics: some weight (related to ML coding)
+- Cooking Recipes: basically zero
+- Romance Novels: zero
+
+Then returns weighted mix of all contents. The relevant books contribute more.
+
+## Why this matters
+
+Before attention, models used RNNs. Problem was information had to flow sequentially. By time you reach end of long sentence, beginning is kinda forgotten.
+
+With attention? Direct access to any position. No forgetting. No distance limit.
+
+also, fully parallelizable which is huge for training speed
+
+## The math (simplified)
+
+```
+Attention(Q, K, V) = softmax(QK^T / sqrt(d_k)) * V
+```
+
+Breaking it down:
+1. $QK^T$ - dot product gives similarity scores
+2. divide by $\sqrt{d_k}$ - scaling factor, prevents softmax from getting too peaky
+3. softmax - converts to probabilities (weights sum to 1)
+4. multiply by V - weighted combination of values
+
+The scaling by $\sqrt{d_k}$ is important. Without it, dot products get large for high dimensions, softmax becomes too confident on single item.
+
+## What I got wrong initially
+
+thought Q, K, V were separate inputs. They're not always. In self-attention, they all come from same input, just projected differently with learned weights.
+
+also thought attention was expensive. it is O(n²) for sequence length n. But the parallelization makes it faster than RNNs in practice for reasonable lengths.
+
+## Next up
+
+In part 2 gonna cover:
+- scaled dot-product attention in detail
+- multi-head attention (why multiple heads?)
+- self-attention vs cross-attention
+
+The visualization tool shows all of this interactively. Play with it: [https://danielsobrado.github.io/ml-animations/animation/attention-mechanism](https://danielsobrado.github.io/ml-animations/animation/attention-mechanism)
+
+---
+
+*Part of the [Understanding Attention](/series/understanding-attention/) series*
@@ -0,0 +1,176 @@
+---
+title: "Bag of Words - the simplest text representation"
+date: 2024-11-22
+draft: false
+tags: ["bag-of-words", "bow", "nlp", "text-representation", "tfidf"]
+categories: ["NLP Fundamentals"]
+---
+
+Before embeddings there was Bag of Words. Still useful, still relevant for some tasks. And understanding it helps understand why newer methods are better.
+
+## What is it?
+
+Represent document as word counts. Ignore order completely.
+
+"The cat sat on the mat"
+"The dog sat on the log"
+
+Vocabulary: [the, cat, sat, on, mat, dog, log]
+
+Document 1: [2, 1, 1, 1, 1, 0, 0]
+Document 2: [2, 0, 1, 1, 0, 1, 1]
+
+That's it. Count each word.
+
+![Bag of Words Process](https://danielsobrado.github.io/ml-animations/animation/bag-of-words)
+
+See it visualized: [Bag of Words Animation](https://danielsobrado.github.io/ml-animations/animation/bag-of-words)
+
+## Building it
+
+```python
+from collections import Counter
+
+def bag_of_words(documents):
+    # build vocabulary
+    vocab = set()
+    for doc in documents:
+        vocab.update(doc.split())
+    vocab = sorted(vocab)
+    word_to_idx = {w: i for i, w in enumerate(vocab)}
+    
+    # vectorize
+    vectors = []
+    for doc in documents:
+        counts = Counter(doc.split())
+        vec = [counts.get(w, 0) for w in vocab]
+        vectors.append(vec)
+    
+    return vectors, vocab
+```
+
+Or just use sklearn:
+```python
+from sklearn.feature_extraction.text import CountVectorizer
+
+vectorizer = CountVectorizer()
+X = vectorizer.fit_transform(documents)
+```
+
+## The problems
+
+**Ignores word order**
+
+"Dog bites man" and "Man bites dog" have identical BoW vectors. Completely different meaning.
+
+**Sparse and high dimensional**
+
+10,000 word vocabulary = 10,000 dim vectors. Mostly zeros.
+
+**No semantic similarity**
+
+"Happy" and "joyful" are as distant as "happy" and "angry". No meaning captured.
+
+**Common words dominate**
+
+"The", "is", "a" appear everywhere. Don't help distinguish documents.
+
+## TF-IDF to the rescue
+
+Term Frequency - Inverse Document Frequency
+
+Weight words by:
+- How often they appear in this document (TF)
+- How rare they are across all documents (IDF)
+
+$$\text{TF-IDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t)$$
+
+$$\text{IDF}(t) = \log\frac{N}{|\{d : t \in d\}|}$$
+
+Words appearing in every document get low weight. Rare, distinctive words get high weight.
+
+```python
+from sklearn.feature_extraction.text import TfidfVectorizer
+
+vectorizer = TfidfVectorizer()
+X = vectorizer.fit_transform(documents)
+```
+
+## N-grams
+
+Capture some word order by including consecutive word pairs (bigrams), triples (trigrams).
+
+"The cat sat" with bigrams:
+- Unigrams: [the, cat, sat]
+- Bigrams: [the_cat, cat_sat]
+
+```python
+vectorizer = CountVectorizer(ngram_range=(1, 2))
+```
+
+Vocabulary explodes but captures more structure.
+
+## When BoW still works
+
+- Document classification (news categories, spam)
+- Search and information retrieval (with TF-IDF)
+- Baseline for comparison
+- When you need interpretability
+- Small datasets
+
+## When it fails
+
+- Sentiment analysis (word order matters)
+- Question answering
+- Anything requiring understanding
+- Short texts (not enough words)
+
+## Preprocessing matters
+
+BoW benefits from:
+- Lowercasing
+- Removing punctuation
+- Stop word removal
+- Stemming/lemmatization
+
+```python
+from sklearn.feature_extraction.text import TfidfVectorizer
+import nltk
+from nltk.corpus import stopwords
+
+vectorizer = TfidfVectorizer(
+    lowercase=True,
+    stop_words='english',
+    max_features=5000,
+    ngram_range=(1, 2)
+)
+```
+
+## Comparison with embeddings
+
+| Aspect | BoW/TF-IDF | Embeddings |
+|--------|------------|------------|
+| Semantic similarity | No | Yes |
+| Word order | No (partial with n-grams) | Yes |
+| Dimensionality | High (vocab size) | Low (100-768) |
+| Interpretable | Yes | No |
+| Training data needed | None | Lots |
+| Compute | Fast | Slower |
+
+## Practical advice
+
+Starting new NLP project?
+
+1. Try TF-IDF first (baseline)
+2. If not good enough, try sentence embeddings
+3. If still not enough, fine-tune BERT
+
+Surprised how often TF-IDF is "good enough" for classification tasks.
+
+The animation shows how documents become vectors: [Bag of Words Animation](https://danielsobrado.github.io/ml-animations/animation/bag-of-words)
+
+---
+
+Related:
+- [Embeddings - better representations](/posts/embeddings/)
+- [Tokenization](/posts/tokenization/)