You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-2Lines changed: 11 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,8 @@ It focuses on fast multi-file conversion to Markdown with a modern Fluent-style
15
15
- Preview modes: rendered Markdown view and raw Markdown view.
16
16
- Save modes: export as one combined file or separate files.
17
17
- Quick actions: copy Markdown, save output, back to queue, start over.
18
-
- Settings for output folder, batch size, header style, table style, and theme mode (light/dark/system).
18
+
- Optional OCR for scanned PDFs and image files, with Azure Document Intelligence first and local Tesseract fallback.
19
+
- Settings for output folder, batch size, header style, table style, OCR, and theme mode (light/dark/system).
19
20
- Built-in shortcuts dialog, update check action, and about dialog.
20
21
21
22
## Installation
@@ -39,6 +40,15 @@ Alternative:
39
40
pip install -e .[dev]
40
41
```
41
42
43
+
### OCR Notes
44
+
45
+
- OCR is optional and disabled by default.
46
+
- Local OCR requires a system `tesseract` binary. Install it from the [official Tesseract project](https://github.com/tesseract-ocr/tesseract). If it is not on your `PATH`, set the executable path in Settings.
47
+
- Azure OCR requires an Azure Document Intelligence endpoint in Settings.
48
+
- Azure Document Intelligence pricing includes [500 free pages per month](https://azure.microsoft.com/en-us/products/ai-foundry/tools/document-intelligence#Pricing) at the time of writing.
49
+
- For API-key auth, set `AZURE_OCR_API_KEY`.
50
+
- If `AZURE_OCR_API_KEY` is not set, Azure OCR falls back to Azure identity credentials supported by `DefaultAzureCredential`.
0 commit comments