Commit 207df82
committed
fix: PaddleOCR dictionary loading via file workaround for ort metadata bug
The ort crate cannot read custom metadata from PaddlePaddle PIR-mode ONNX
models (metadata.custom_keys() returns empty). Work around this by shipping
the character dictionary as a separate file and using init_models_with_dict().
Also replace the test image with a larger 800x200 version for reliable OCR,
and fix integration test assertions for model path checks and Chinese text.1 parent 55d3f5b commit 207df82
File tree
8 files changed
+95
-51
lines changed- crates/kreuzberg
- src
- core/pipeline
- paddle_ocr
- tests
- test_documents
- ground_truth/docx
- images
8 files changed
+95
-51
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
88 | | - | |
| 87 | + | |
| 88 | + | |
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
137 | | - | |
| 136 | + | |
| 137 | + | |
138 | 138 | | |
139 | 139 | | |
140 | 140 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
119 | 127 | | |
120 | | - | |
| 128 | + | |
121 | 129 | | |
122 | 130 | | |
123 | 131 | | |
| |||
130 | 138 | | |
131 | 139 | | |
132 | 140 | | |
| 141 | + | |
133 | 142 | | |
134 | 143 | | |
135 | 144 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
79 | | - | |
80 | | - | |
| 79 | + | |
| 80 | + | |
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
84 | 91 | | |
85 | 92 | | |
86 | 93 | | |
| |||
90 | 97 | | |
91 | 98 | | |
92 | 99 | | |
| 100 | + | |
| 101 | + | |
93 | 102 | | |
94 | 103 | | |
95 | 104 | | |
| |||
211 | 220 | | |
212 | 221 | | |
213 | 222 | | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
214 | 234 | | |
215 | 235 | | |
216 | 236 | | |
217 | 237 | | |
218 | 238 | | |
219 | 239 | | |
| 240 | + | |
220 | 241 | | |
221 | 242 | | |
222 | 243 | | |
| |||
319 | 340 | | |
320 | 341 | | |
321 | 342 | | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
322 | 348 | | |
323 | 349 | | |
324 | 350 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
60 | 60 | | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
65 | 68 | | |
66 | 69 | | |
67 | 70 | | |
68 | 71 | | |
69 | 72 | | |
70 | 73 | | |
71 | | - | |
| 74 | + | |
| 75 | + | |
72 | 76 | | |
73 | 77 | | |
74 | 78 | | |
75 | 79 | | |
76 | 80 | | |
77 | 81 | | |
78 | 82 | | |
| 83 | + | |
79 | 84 | | |
80 | 85 | | |
81 | 86 | | |
| |||
161 | 166 | | |
162 | 167 | | |
163 | 168 | | |
164 | | - | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
165 | 174 | | |
166 | 175 | | |
167 | 176 | | |
| |||
188 | 197 | | |
189 | 198 | | |
190 | 199 | | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
198 | 207 | | |
199 | 208 | | |
200 | 209 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | | - | |
14 | | - | |
| 13 | + | |
| 14 | + | |
15 | 15 | | |
16 | | - | |
17 | | - | |
| 16 | + | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | | - | |
25 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
38 | | - | |
| 37 | + | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
46 | | - | |
47 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
48 | 48 | | |
Loading
0 commit comments