getsentry · sergical · Oct 6, 2025 · Oct 6, 2025 · Oct 6, 2025 · Oct 24, 2025
diff --git a/BUILD_PERFORMANCE_HYPOTHESIS.md b/BUILD_PERFORMANCE_HYPOTHESIS.md
@@ -0,0 +1,166 @@
+# Build Performance Hypothesis & Measurement Plan
+
+## 📊 Current State
+- **Build time**: ~30 minutes on Vercel
+- **Total pages**: ~2,500+ static pages  
+- **Content**: ~162,901 lines of MDX across thousands of files
+
+## 🎯 Hypothesis: Where Time is Being Spent
+
+### Primary Hypothesis (High Confidence)
+**Frontmatter parsing is happening repeatedly for every page**
+
+**Why we think this:**
+1. Module-level cache (`getDocsFrontMatterCache`) may not persist across:
+   - Next.js worker processes during static generation
+   - Between `generateStaticParams()` and page component rendering
+   - Between `generateMetadata()` calls
+
+2. Each call to `getDocsFrontMatter()` without cache:
+   - Recursively scans `docs/` directory (~2,500+ files)
+   - Reads each file from disk
+   - Parses frontmatter with `gray-matter`
+   - Processes `common/` folders, versions, configs
+   - Processes API categories
+
+3. Number of potential calls per build:
+   ```
+   generateStaticParams: 1x getDocsFrontMatter()
+   For each of ~2,500 pages:
+     - Page component: 1x getDocsRootNode() → getDocsFrontMatter()
+     - generateMetadata: 1x getDocsRootNode() → getDocsFrontMatter()
+
+   Worst case: 1 + (2,500 × 2) = ~5,001 calls
+   Best case (if cache works): 1 call
+   ```
+
+4. **Expected savings if fixed**: 20-25 minutes
+
+### Secondary Hypothesis (Medium Confidence)
+**MDX compilation is slow even with cache**
+
+**Why we think this:**
+1. Even with disk cache, each page needs:
+   - Cache key generation (MD5 of source)
+   - File I/O to check cache
+   - Decompression of cached bundle
+   - MDX component creation
+
+2. Some files skip cache (contain `@inject`, `<PlatformSDKPackageName>`, etc.)
+
+3. **Expected savings if optimized**: 2-5 minutes
+
+### Tertiary Hypothesis (Lower Confidence)
+**DocTree building is redundant**
+
+**Why we think this:**
+1. `frontmatterToTree()` is called for every page that needs `getDocsRootNode()`
+2. Tree building involves sorting and nested iteration
+3. Could be pre-built and served as static JSON
+
+4. **Expected savings if optimized**: 1-2 minutes
+
+## 🧪 Measurement Plan
+
+### What We'll Measure
+
+1. **Frontmatter Cache Effectiveness**
+   - How many times `getDocsFrontMatter()` is called
+   - How many times `getDocsFrontMatterUncached()` is actually executed
+   - Time spent in `getDocsFrontMatterUncached()`
+   - Cache hit/miss rate
+
+2. **MDX Compilation Performance**
+   - Time per file compilation
+   - Cache hit rate for MDX bundling
+   - Time in cache I/O vs actual compilation
+
+3. **Page Generation Timing**
+   - Time in `generateStaticParams()`
+   - Average time per page render
+   - Time in `generateMetadata()`
+   - Time building doc tree
+
+### Where We'll Add Timing Logs
+
+#### In `src/mdx.ts`:
+- `getDocsFrontMatter()` - entry point, count calls
+- `getDocsFrontMatterUncached()` - actual work, measure duration
+- `getAllFilesFrontMatter()` - file processing
+- `getFileBySlug()` - MDX compilation entry
+- `bundleMDX()` - actual compilation time
+- Cache read/write operations
+
+#### In `src/docTree.ts`:
+- `getDocsRootNode()` - count calls
+- `getDocsRootNodeUncached()` - measure tree building time
+- `frontmatterToTree()` - measure tree construction
+
+#### In `app/[[...path]]/page.tsx`:
+- `generateStaticParams()` - measure total time
+- `Page()` - measure per-page render time
+- `generateMetadata()` - measure metadata generation time
+
+## 📈 Success Metrics
+
+### What confirms our hypothesis:
+1. **Frontmatter called repeatedly**: 
+   - See `getDocsFrontMatterUncached()` called > 10 times
+   - See high total time in frontmatter parsing (> 10 minutes)
+
+2. **Cache not working**:
+   - Cache hit rate < 50%
+   - See duplicate calls from same page
+
+3. **MDX compilation is slow**:
+   - Average compilation time > 100ms per file
+   - Total MDX time > 10 minutes
+
+### What would disprove our hypothesis:
+1. `getDocsFrontMatterUncached()` only called 1-2 times
+2. Cache hit rate > 90%
+3. Most time is in Next.js internals (not our code)
+
+## 🔄 Next Steps
+
+1. **Add timing instrumentation** (this document)
+2. **Run instrumented build** on Vercel or locally with `CI=true`
+3. **Analyze timing logs** to validate/reject hypothesis
+4. **Implement fixes** for confirmed bottlenecks
+5. **Measure improvements** with same instrumentation
+
+## 📝 Log Format
+
+We'll use this format for easy parsing:
+```
+[PERF:category] operation_name: Xms (metadata)
+```
+
+Examples:
+```
+[PERF:frontmatter] getDocsFrontMatter called (call #1)
+[PERF:frontmatter] getDocsFrontMatterUncached started
+[PERF:frontmatter] getDocsFrontMatterUncached completed: 8542ms (2,487 files)
+[PERF:tree] getDocsRootNode called (call #12)
+[PERF:mdx] getFileBySlug started: docs/platforms/javascript/index
+[PERF:mdx] bundleMDX completed: 145ms (cache miss)
+[PERF:page] Page render completed: 234ms (platforms/javascript)
+```
+
+This will allow us to:
+- Filter logs by category (`grep "PERF:frontmatter"`)
+- Calculate total time per category
+- Count calls
+- Identify slowest operations
+
+## 🎯 Expected Outcome
+
+After measurement, we expect to see:
+1. **Confirmed**: Frontmatter parsing happening 100s or 1000s of times
+2. **Confirmed**: Each call taking 5-15 seconds
+3. **Total frontmatter time**: 10-20 minutes (the majority of build time)
+4. **MDX compilation time**: 5-10 minutes
+5. **Next.js overhead**: 5-10 minutes
+
+**If confirmed**, implementing disk-based frontmatter cache should reduce build time by **20-25 minutes** (from ~30min to ~5-10min).
+
diff --git a/NEXT_STEPS.md b/NEXT_STEPS.md
@@ -0,0 +1,145 @@
+# Next Steps: Performance Investigation
+
+## ✅ What We Just Did
+
+Added comprehensive performance instrumentation to measure build times WITHOUT changing any logic.
+
+## 📊 Files Modified
+
+1. **`src/mdx.ts`** - Added timing for frontmatter parsing and MDX compilation
+2. **`src/docTree.ts`** - Added timing for document tree building  
+3. **`app/[[...path]]/page.tsx`** - Added timing for page generation
+4. **`BUILD_PERFORMANCE_HYPOTHESIS.md`** - Documents our hypothesis
+5. **`PERFORMANCE_MEASUREMENT_GUIDE.md`** - How to run and analyze
+
+## 🎯 Our Hypothesis
+
+**The frontmatter cache isn't working during Next.js builds**, causing thousands of files to be parsed repeatedly.
+
+**Expected impact if true**: 20-25 minute savings (from 30min to 5-10min)
+
+## 🧪 How to Test
+
+### Quick Test (5-10 minutes locally):
+```bash
+# Clean build
+rm -rf .next
+
+# Run instrumented build
+CI=true yarn build 2>&1 | tee build-performance.log
+
+# Quick check - how many times did we parse frontmatter?
+grep -c "getDocsFrontMatterUncached started" build-performance.log
+```
+
+**What we expect to see:**
+- ❌ **Bad (confirms hypothesis)**: Number is > 10 (could be 100s or 1000s)
+- ✅ **Good (rejects hypothesis)**: Number is 1-2
+
+### Full Analysis:
+```bash
+# Follow commands in PERFORMANCE_MEASUREMENT_GUIDE.md
+# to create a detailed performance summary
+```
+
+## 📈 Decision Tree
+
+```
+Run instrumented build
+    ↓
+Check: How many times was frontmatter parsed?
+    ↓
+    ├─ > 10 times → HYPOTHESIS CONFIRMED
+    │   ↓
+    │   Implement disk-based frontmatter cache
+    │   ↓
+    │   Expected: 20-25 min savings
+    │   ↓
+    │   Measure again
+    │
+    └─ 1-2 times → HYPOTHESIS REJECTED
+        ↓
+        Analyze where time IS being spent
+        ↓
+        - MDX compilation slow?
+        - Next.js internals?
+        - Network/IO?
+        ↓
+        Profile further & implement different fixes
+```
+
+## 💡 If Hypothesis is Confirmed
+
+We already have the fix ready! It's the same disk-based caching pattern you use for MDX compilation, just applied to frontmatter.
+
+The fix involves:
+1. Cache frontmatter to `.next/cache/frontmatter/`
+2. Use MD5 hash of file list as cache key
+3. Brotli compress the JSON
+4. Cache persists across builds (Vercel already caches `.next/cache/`)
+
+**Changes needed**: ~50 lines in `src/mdx.ts`
+
+## 📝 What the Logs Tell Us
+
+### Example log output:
+```
+[PERF:page] generateStaticParams started
+[PERF:frontmatter] getDocsFrontMatter called (call #1, cached: false)
+[PERF:frontmatter] getDocsFrontMatterUncached started
+[PERF:frontmatter] getDocsFrontMatterUncached completed: 8542ms (2,487 entries)
+[PERF:page] generateStaticParams completed: 8723ms (2,488 paths)
+
+[PERF:tree] getDocsRootNode called (call #1, cached: false)
+[PERF:tree] getDocsRootNodeUncached started
+[PERF:frontmatter] getDocsFrontMatter called (call #2, cached: true)  ← Good!
+[PERF:tree] getDocsRootNodeUncached completed: 56ms
+
+[PERF:mdx] getFileBySlug started (call #1): docs/platforms/javascript/index
+[PERF:mdx] bundleMDX starting for docs/platforms/javascript/index (cache miss)
+[PERF:mdx] bundleMDX completed: 145ms for docs/platforms/javascript/index
+[PERF:mdx] getFileBySlug completed: 156ms (docs/platforms/javascript/index)
+
+[PERF:page] Page render started (100): platforms/javascript/guides/react
+[PERF:tree] getDocsRootNode called (call #100, cached: true)  ← Good!
+[PERF:frontmatter] getDocsFrontMatter called (call #200, cached: true)  ← Wait, why so many?
+[PERF:page] Page render completed: 234ms (platforms/javascript/guides/react)
+```
+
+### What to look for:
+- **"cached: false"** appearing many times → Cache not persisting
+- **High call counts** for getDocsFrontMatter → Called too many times
+- **Long durations** for getDocsFrontMatterUncached → Slow parsing
+- **"cached: true"** mostly → Cache IS working (hypothesis wrong!)
+
+## 🚀 Recommended Action Plan
+
+1. **Today**: Run instrumented build locally or on Vercel
+2. **Review logs**: Check call counts and timing
+3. **If confirmed**: Implement disk-based cache (we have code ready)
+4. **If rejected**: Dig deeper into what's actually slow
+5. **Measure again**: Validate the fix worked
+6. **Profile next bottleneck**: There may be more optimizations
+
+## 📞 Questions to Answer
+
+After running the instrumented build:
+
+1. How many times is `getDocsFrontMatter` called?
+2. How many times is it actually parsed (uncached)?
+3. What's the duration of each parsing?
+4. What's the total time in frontmatter operations?
+5. What percentage of build time is frontmatter?
+
+## 🎬 Ready to Start
+
+```bash
+# Go!
+CI=true yarn build 2>&1 | tee build-performance.log
+
+# When done, share the output of:
+grep "\[PERF:" build-performance.log | head -100
+```
+
+This will give us enough data to validate or reject the hypothesis and decide on next steps.
+