Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 166 additions & 0 deletions BUILD_PERFORMANCE_HYPOTHESIS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Build Performance Hypothesis & Measurement Plan

## 📊 Current State
- **Build time**: ~30 minutes on Vercel
- **Total pages**: ~2,500+ static pages
- **Content**: ~162,901 lines of MDX across thousands of files

## 🎯 Hypothesis: Where Time is Being Spent

### Primary Hypothesis (High Confidence)
**Frontmatter parsing is happening repeatedly for every page**

**Why we think this:**
1. Module-level cache (`getDocsFrontMatterCache`) may not persist across:
- Next.js worker processes during static generation
- Between `generateStaticParams()` and page component rendering
- Between `generateMetadata()` calls

2. Each call to `getDocsFrontMatter()` without cache:
- Recursively scans `docs/` directory (~2,500+ files)
- Reads each file from disk
- Parses frontmatter with `gray-matter`
- Processes `common/` folders, versions, configs
- Processes API categories

3. Number of potential calls per build:
```
generateStaticParams: 1x getDocsFrontMatter()
For each of ~2,500 pages:
- Page component: 1x getDocsRootNode() → getDocsFrontMatter()
- generateMetadata: 1x getDocsRootNode() → getDocsFrontMatter()

Worst case: 1 + (2,500 × 2) = ~5,001 calls
Best case (if cache works): 1 call
```

4. **Expected savings if fixed**: 20-25 minutes

### Secondary Hypothesis (Medium Confidence)
**MDX compilation is slow even with cache**

**Why we think this:**
1. Even with disk cache, each page needs:
- Cache key generation (MD5 of source)
- File I/O to check cache
- Decompression of cached bundle
- MDX component creation

2. Some files skip cache (contain `@inject`, `<PlatformSDKPackageName>`, etc.)

3. **Expected savings if optimized**: 2-5 minutes

### Tertiary Hypothesis (Lower Confidence)
**DocTree building is redundant**

**Why we think this:**
1. `frontmatterToTree()` is called for every page that needs `getDocsRootNode()`
2. Tree building involves sorting and nested iteration
3. Could be pre-built and served as static JSON

4. **Expected savings if optimized**: 1-2 minutes

## 🧪 Measurement Plan

### What We'll Measure

1. **Frontmatter Cache Effectiveness**
- How many times `getDocsFrontMatter()` is called
- How many times `getDocsFrontMatterUncached()` is actually executed
- Time spent in `getDocsFrontMatterUncached()`
- Cache hit/miss rate

2. **MDX Compilation Performance**
- Time per file compilation
- Cache hit rate for MDX bundling
- Time in cache I/O vs actual compilation

3. **Page Generation Timing**
- Time in `generateStaticParams()`
- Average time per page render
- Time in `generateMetadata()`
- Time building doc tree

### Where We'll Add Timing Logs

#### In `src/mdx.ts`:
- `getDocsFrontMatter()` - entry point, count calls
- `getDocsFrontMatterUncached()` - actual work, measure duration
- `getAllFilesFrontMatter()` - file processing
- `getFileBySlug()` - MDX compilation entry
- `bundleMDX()` - actual compilation time
- Cache read/write operations

#### In `src/docTree.ts`:
- `getDocsRootNode()` - count calls
- `getDocsRootNodeUncached()` - measure tree building time
- `frontmatterToTree()` - measure tree construction

#### In `app/[[...path]]/page.tsx`:
- `generateStaticParams()` - measure total time
- `Page()` - measure per-page render time
- `generateMetadata()` - measure metadata generation time

## 📈 Success Metrics

### What confirms our hypothesis:
1. **Frontmatter called repeatedly**:
- See `getDocsFrontMatterUncached()` called > 10 times
- See high total time in frontmatter parsing (> 10 minutes)

2. **Cache not working**:
- Cache hit rate < 50%
- See duplicate calls from same page

3. **MDX compilation is slow**:
- Average compilation time > 100ms per file
- Total MDX time > 10 minutes

### What would disprove our hypothesis:
1. `getDocsFrontMatterUncached()` only called 1-2 times
2. Cache hit rate > 90%
3. Most time is in Next.js internals (not our code)

## 🔄 Next Steps

1. **Add timing instrumentation** (this document)
2. **Run instrumented build** on Vercel or locally with `CI=true`
3. **Analyze timing logs** to validate/reject hypothesis
4. **Implement fixes** for confirmed bottlenecks
5. **Measure improvements** with same instrumentation

## 📝 Log Format

We'll use this format for easy parsing:
```
[PERF:category] operation_name: Xms (metadata)
```

Examples:
```
[PERF:frontmatter] getDocsFrontMatter called (call #1)
[PERF:frontmatter] getDocsFrontMatterUncached started
[PERF:frontmatter] getDocsFrontMatterUncached completed: 8542ms (2,487 files)
[PERF:tree] getDocsRootNode called (call #12)
[PERF:mdx] getFileBySlug started: docs/platforms/javascript/index
[PERF:mdx] bundleMDX completed: 145ms (cache miss)
[PERF:page] Page render completed: 234ms (platforms/javascript)
```

This will allow us to:
- Filter logs by category (`grep "PERF:frontmatter"`)
- Calculate total time per category
- Count calls
- Identify slowest operations

## 🎯 Expected Outcome

After measurement, we expect to see:
1. **Confirmed**: Frontmatter parsing happening 100s or 1000s of times
2. **Confirmed**: Each call taking 5-15 seconds
3. **Total frontmatter time**: 10-20 minutes (the majority of build time)
4. **MDX compilation time**: 5-10 minutes
5. **Next.js overhead**: 5-10 minutes

**If confirmed**, implementing disk-based frontmatter cache should reduce build time by **20-25 minutes** (from ~30min to ~5-10min).

145 changes: 145 additions & 0 deletions NEXT_STEPS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Next Steps: Performance Investigation

## ✅ What We Just Did

Added comprehensive performance instrumentation to measure build times WITHOUT changing any logic.

## 📊 Files Modified

1. **`src/mdx.ts`** - Added timing for frontmatter parsing and MDX compilation
2. **`src/docTree.ts`** - Added timing for document tree building
3. **`app/[[...path]]/page.tsx`** - Added timing for page generation
4. **`BUILD_PERFORMANCE_HYPOTHESIS.md`** - Documents our hypothesis
5. **`PERFORMANCE_MEASUREMENT_GUIDE.md`** - How to run and analyze

## 🎯 Our Hypothesis

**The frontmatter cache isn't working during Next.js builds**, causing thousands of files to be parsed repeatedly.

**Expected impact if true**: 20-25 minute savings (from 30min to 5-10min)

## 🧪 How to Test

### Quick Test (5-10 minutes locally):
```bash
# Clean build
rm -rf .next

# Run instrumented build
CI=true yarn build 2>&1 | tee build-performance.log

# Quick check - how many times did we parse frontmatter?
grep -c "getDocsFrontMatterUncached started" build-performance.log
```

**What we expect to see:**
- ❌ **Bad (confirms hypothesis)**: Number is > 10 (could be 100s or 1000s)
- ✅ **Good (rejects hypothesis)**: Number is 1-2

### Full Analysis:
```bash
# Follow commands in PERFORMANCE_MEASUREMENT_GUIDE.md
# to create a detailed performance summary
```

## 📈 Decision Tree

```
Run instrumented build
Check: How many times was frontmatter parsed?
├─ > 10 times → HYPOTHESIS CONFIRMED
│ ↓
│ Implement disk-based frontmatter cache
│ ↓
│ Expected: 20-25 min savings
│ ↓
│ Measure again
└─ 1-2 times → HYPOTHESIS REJECTED
Analyze where time IS being spent
- MDX compilation slow?
- Next.js internals?
- Network/IO?
Profile further & implement different fixes
```

## 💡 If Hypothesis is Confirmed

We already have the fix ready! It's the same disk-based caching pattern you use for MDX compilation, just applied to frontmatter.

The fix involves:
1. Cache frontmatter to `.next/cache/frontmatter/`
2. Use MD5 hash of file list as cache key
3. Brotli compress the JSON
4. Cache persists across builds (Vercel already caches `.next/cache/`)

**Changes needed**: ~50 lines in `src/mdx.ts`

## 📝 What the Logs Tell Us

### Example log output:
```
[PERF:page] generateStaticParams started
[PERF:frontmatter] getDocsFrontMatter called (call #1, cached: false)
[PERF:frontmatter] getDocsFrontMatterUncached started
[PERF:frontmatter] getDocsFrontMatterUncached completed: 8542ms (2,487 entries)
[PERF:page] generateStaticParams completed: 8723ms (2,488 paths)

[PERF:tree] getDocsRootNode called (call #1, cached: false)
[PERF:tree] getDocsRootNodeUncached started
[PERF:frontmatter] getDocsFrontMatter called (call #2, cached: true) ← Good!
[PERF:tree] getDocsRootNodeUncached completed: 56ms

[PERF:mdx] getFileBySlug started (call #1): docs/platforms/javascript/index
[PERF:mdx] bundleMDX starting for docs/platforms/javascript/index (cache miss)
[PERF:mdx] bundleMDX completed: 145ms for docs/platforms/javascript/index
[PERF:mdx] getFileBySlug completed: 156ms (docs/platforms/javascript/index)

[PERF:page] Page render started (100): platforms/javascript/guides/react
[PERF:tree] getDocsRootNode called (call #100, cached: true) ← Good!
[PERF:frontmatter] getDocsFrontMatter called (call #200, cached: true) ← Wait, why so many?
[PERF:page] Page render completed: 234ms (platforms/javascript/guides/react)
```

### What to look for:
- **"cached: false"** appearing many times → Cache not persisting
- **High call counts** for getDocsFrontMatter → Called too many times
- **Long durations** for getDocsFrontMatterUncached → Slow parsing
- **"cached: true"** mostly → Cache IS working (hypothesis wrong!)

## 🚀 Recommended Action Plan

1. **Today**: Run instrumented build locally or on Vercel
2. **Review logs**: Check call counts and timing
3. **If confirmed**: Implement disk-based cache (we have code ready)
4. **If rejected**: Dig deeper into what's actually slow
5. **Measure again**: Validate the fix worked
6. **Profile next bottleneck**: There may be more optimizations

## 📞 Questions to Answer

After running the instrumented build:

1. How many times is `getDocsFrontMatter` called?
2. How many times is it actually parsed (uncached)?
3. What's the duration of each parsing?
4. What's the total time in frontmatter operations?
5. What percentage of build time is frontmatter?

## 🎬 Ready to Start

```bash
# Go!
CI=true yarn build 2>&1 | tee build-performance.log

# When done, share the output of:
grep "\[PERF:" build-performance.log | head -100
```

This will give us enough data to validate or reject the hypothesis and decide on next steps.

Loading
Loading