Commit 251dddc
authored
[MS] Update PDF table extraction to support aligned Markdown (#1499)
* Added PDF table extraction feature with aligned Markdown (#1419)
* Add PDF test files and enhance extraction tests
- Added a medical report scan PDF for testing scanned PDF handling.
- Included a retail purchase receipt PDF to validate receipt extraction functionality.
- Introduced a multipage invoice PDF to test extraction of complex invoice structures.
- Added a borderless table PDF for testing inventory reconciliation report extraction.
- Implemented comprehensive tests for PDF table extraction, ensuring proper structure and data integrity.
- Enhanced existing tests to validate the order and presence of extracted content across various PDF types.
* fix: update dependencies for PDF processing and improve table extraction logic
* Bumped version of pdfminer.six
---------
Authored-by: Ashok <ashh010101@gmail.com>1 parent dde250a commit 251dddc
File tree
8 files changed
+1289
-21
lines changed- packages/markitdown
- src/markitdown/converters
- tests
- test_files
8 files changed
+1289
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
55 | 56 | | |
56 | 57 | | |
57 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
| 45 | + | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
50 | | - | |
| 51 | + | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
56 | | - | |
| 57 | + | |
57 | 58 | | |
58 | 59 | | |
59 | 60 | | |
| |||
0 commit comments