mad-sol-dev
diff --git a/‎CHANGELOG.md‎
Lines changed: 14 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎OCR_BACKLOG.md‎
Lines changed: 65 additions & 2 deletions b/‎OCR_BACKLOG.md‎
Lines changed: 65 additions & 2 deletions
@@ -12,6 +12,20 @@
   - Enables selective OCR processing (e.g., OCR only 50 of 800 pages with markers)
   - Non-breaking change: defaults to `false` to preserve existing behavior
 
+- **OCR:** add persistent disk cache for OCR results
+  - 3-layer cache architecture: in-memory → disk → API
+  - Stores OCR results as `{pdf_basename}_ocr.json` alongside PDFs
+  - Survives MCP server restarts and reduces expensive API calls
+  - Fingerprint validation automatically invalidates cache on PDF changes
+  - Supports both page OCR (`pdf_ocr_page`) and image OCR (`pdf_ocr_image`)
+  - Only works for file-based PDFs (not URLs)
+
+### 🐛 Bug Fixes
+
+- **pdf_read_pages:** fix image extraction when `insert_markers=true` but `include_image_indexes=false`
+  - Images were not being extracted for marker insertion
+  - Now extracts images when EITHER parameter is enabled
+
 ## 2.1.0 (2025-12-17)
 
 ### ✨ Features
 
@@ -1,9 +1,16 @@
 # OCR Implementation - Status & Backlog
 
 **Status:** ✅ Implementation complete, ⚠️ Documentation incomplete
-**Last Updated:** 2025-12-21
+**Last Updated:** 2025-12-22
 **API Version Checked:** Mistral API 2025-12 (mistral-large-2512, mistral-ocr-2512)
 
+## 🆕 Update Summary (2025-12-22)
+- ✅ **Implemented persistent disk cache** for OCR results
+- ✅ 3-layer cache architecture: in-memory → disk → API
+- ✅ JSON cache files stored alongside PDFs (`{basename}_ocr.json`)
+- ✅ Fingerprint validation to detect PDF changes
+- ✅ Supports both page and image OCR caching
+
 ## 🆕 Update Summary (2025-12-21)
 - ✅ Verified against current Mistral API documentation
 - ✅ Updated model names (mistral-large-2512, mistral-medium-2508, etc.)
@@ -34,11 +41,67 @@
 - Generic HTTP OCR provider pattern
 - Page OCR (`pdf_ocr_page`) with configurable scale
 - Image OCR (`pdf_ocr_image`) for embedded images
-- Fingerprint-based caching (text + provider key)
+- **3-layer caching architecture** (NEW in v1.4.0):
+  - **Layer 1: In-memory cache** - Fast, volatile (survives within session)
+  - **Layer 2: Disk cache** - Persistent, survives restarts (JSON files)
+  - **Layer 3: API calls** - Slow, expensive (only when cache misses)
+- Fingerprint-based cache validation (detects PDF modifications)
 - Mock provider for testing
 - Vex schema validation for provider config
 - Cache management tools (`pdf_cache_stats`, `pdf_cache_clear`)
 
+### 💾 Disk Cache Implementation
+
+**File Format:** `{pdf_basename}_ocr.json` (stored in same directory as PDF)
+
+**Structure:**
+```json
+{
+  "fingerprint": "sha256-hash-of-first-64kb",
+  "pdf_path": "/path/to/document.pdf",
+  "created_at": "2025-12-22T...",
+  "updated_at": "2025-12-22T...",
+  "ocr_provider": "mistral-ocr-2512",
+  "pages": {
+    "2": {
+      "text": "OCR result...",
+      "markdown": "...",
+      "tables": [...],
+      "hyperlinks": [...],
+      "dimensions": {...},
+      "provider_hash": "sha256...",
+      "cached_at": "2025-12-22T...",
+      "scale": 1.5
+    }
+  },
+  "images": {
+    "2/0": {
+      "text": "OCR result for image 0 on page 2",
+      "markdown": "...",
+      "provider_hash": "sha256...",
+      "cached_at": "2025-12-22T..."
+    }
+  }
+}
+```
+
+**Benefits:**
+- ✅ Survives MCP server restarts
+- ✅ Reduces API costs (expensive Mistral OCR calls)
+- ✅ Can be version-controlled with PDFs
+- ✅ Shareable between users/machines
+- ✅ Automatic invalidation on PDF changes (fingerprint mismatch)
+
+**Limitations:**
+- Only works for file-based PDFs (not URLs)
+- Cache file stored in PDF directory (requires write permissions)
+- No automatic cleanup of stale cache files
+
+**Code Locations:**
+- **Types:** `src/types/cache.ts` - Cache structure definitions
+- **Utilities:** `src/utils/diskCache.ts` - Load/save functions
+- **Integration:** `src/handlers/ocrPage.ts`, `src/handlers/ocrImage.ts` - Handler integration
+
 ### ⚠️ Mistral Integration Status
 **No Mistral-specific code exists** - uses generic HTTP provider.