Commit a4e8f33
committed
fix: fix R bindings, UTF-16 XML parsing, and benchmark harness issues
- Fix UTF-16 XML extraction rejecting files with odd byte counts by
truncating to even length instead of erroring
- Fix R NUL byte crash by stripping embedded NUL from strings at FFI boundary
- Add %||% polyfill for R < 4.4 compatibility
- Register kreuzberg-r adapter in benchmark harness
- Fix R server-mode stdin reading (file("stdin") vs stdin())
- Fix uv pip show error handling when uv is not installed
- Replace empty XLA/XLAM test fixtures with data-containing files
- Remove PDF fixtures that require OCR or trigger pdfium bugs
- Remove duplicate pdf_searchable_ocr fixture
- Add unit tests for UTF-16 XML parsing and PDF rotation stripping1 parent a38876c commit a4e8f33
File tree
21 files changed
+191
-181
lines changed- crates/kreuzberg/src
- extraction
- pdf
- packages/r
- R
- src/rust/src
- test_documents/xlsx
- tools/benchmark-harness
- fixtures
- pdf
- scripts
- src
21 files changed
+191
-181
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
25 | 28 | | |
26 | 29 | | |
27 | 30 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
200 | 200 | | |
201 | 201 | | |
202 | 202 | | |
203 | | - | |
| 203 | + | |
204 | 204 | | |
205 | | - | |
206 | | - | |
207 | | - | |
| 205 | + | |
| 206 | + | |
208 | 207 | | |
209 | 208 | | |
210 | 209 | | |
| |||
463 | 462 | | |
464 | 463 | | |
465 | 464 | | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
466 | 515 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
555 | 555 | | |
556 | 556 | | |
557 | 557 | | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
558 | 589 | | |
559 | 590 | | |
560 | 591 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
21 | 12 | | |
22 | 13 | | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
28 | 21 | | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | 22 | | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | 23 | | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
| 24 | + | |
42 | 25 | | |
43 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
44 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
45 | 52 | | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | 53 | | |
53 | | - | |
54 | 54 | | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
| 55 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
2 | 5 | | |
3 | 6 | | |
4 | 7 | | |
5 | 8 | | |
6 | | - | |
7 | 9 | | |
8 | 10 | | |
9 | 11 | | |
10 | 12 | | |
11 | | - | |
12 | | - | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
13 | 16 | | |
14 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
15 | 20 | | |
16 | | - | |
17 | 21 | | |
18 | 22 | | |
19 | 23 | | |
| |||
22 | 26 | | |
23 | 27 | | |
24 | 28 | | |
25 | | - | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
| |||
31 | 34 | | |
32 | 35 | | |
33 | 36 | | |
34 | | - | |
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
| |||
40 | 42 | | |
41 | 43 | | |
42 | 44 | | |
43 | | - | |
44 | | - | |
| 45 | + | |
45 | 46 | | |
46 | | - | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
50 | | - | |
| 51 | + | |
51 | 52 | | |
52 | | - | |
53 | | - | |
| 53 | + | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
| 59 | + | |
60 | 60 | | |
61 | | - | |
62 | | - | |
| 61 | + | |
63 | 62 | | |
64 | | - | |
| 63 | + | |
65 | 64 | | |
66 | 65 | | |
67 | 66 | | |
68 | | - | |
| 67 | + | |
69 | 68 | | |
70 | | - | |
71 | 69 | | |
72 | 70 | | |
73 | | - | |
| 71 | + | |
74 | 72 | | |
75 | | - | |
| 73 | + | |
76 | 74 | | |
77 | | - | |
78 | 75 | | |
79 | 76 | | |
80 | 77 | | |
81 | 78 | | |
82 | | - | |
83 | 79 | | |
84 | 80 | | |
85 | 81 | | |
86 | 82 | | |
87 | 83 | | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
22 | 27 | | |
23 | 28 | | |
24 | 29 | | |
| |||
Binary file not shown.
Binary file not shown.
0 commit comments