Commit 320ecf1
feat(ocr): add persistent disk cache for OCR results
Implements 3-layer caching architecture to reduce expensive API calls:
- Layer 1: In-memory cache (fast, volatile)
- Layer 2: Disk cache (persistent, JSON files)
- Layer 3: API calls (slow, expensive)
OCR results are now stored as {pdf_basename}_ocr.json alongside the
source PDF, surviving MCP server restarts and enabling cost-effective
reuse of Mistral OCR results.
Features:
- Automatic fingerprint validation (invalidates on PDF changes)
- Provider-specific caching (different OCR settings = separate cache)
- Supports both page OCR (pdf_ocr_page) and image OCR (pdf_ocr_image)
- Structured storage for Mistral OCR metadata (markdown, tables, etc.)
Limitations:
- Only works for file-based PDFs (not URLs)
- Requires write permissions in PDF directory
Files:
- src/types/cache.ts: Cache structure type definitions
- src/utils/diskCache.ts: Load/save/get/set utilities
- src/handlers/ocrPage.ts: Integrated 3-layer cache
- src/handlers/ocrImage.ts: Integrated 3-layer cache
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>1 parent ee633b4 commit 320ecf1
File tree
7 files changed
+725
-18
lines changed- dist
- src
- handlers
- types
- utils
7 files changed
+725
-18
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
15 | 29 | | |
16 | 30 | | |
17 | 31 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
7 | 14 | | |
8 | 15 | | |
9 | 16 | | |
| |||
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
37 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
38 | 49 | | |
39 | 50 | | |
40 | 51 | | |
41 | 52 | | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
42 | 105 | | |
43 | 106 | | |
44 | 107 | | |
| |||
0 commit comments