Commit 27564aa
committed
fix: DOCX formatted markdown output, typst table extraction, clippy fixes
- DOCX extraction now produces properly formatted markdown: bold, italic,
underline, strikethrough, hyperlinks, heading hierarchy, bullet/numbered
lists with nesting, and interleaved table rendering (#376)
- Fix heading level overflow: Heading5+ clamped at h6
- Fix table cell formatting stripped in ExtractionResult tables
- Fix typst extract_table_content double-counting opening parenthesis
- Fix clippy collapsible_if in email.rs
- Add 16 DOCX formatting integration tests
- Add missing typst pandoc baseline files
- Regenerate DOCX ground truth files1 parent e8c3607 commit 27564aa
File tree
34 files changed
+2111
-305
lines changed- crates
- kreuzberg-paddle-ocr/tests
- kreuzberg
- src
- extraction
- docx
- extractors
- paddle_ocr
- tests
- test_documents
- ground_truth/docx
- typst
34 files changed
+2111
-305
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
53 | 64 | | |
54 | 65 | | |
55 | 66 | | |
| |||
0 commit comments