Commit a0068d4
authored
Feature/enhanced label cleaning (#47)
* Enhance span annotation handling and logging for zero-span sequences
* Add tests for processing sequences with zero spans and ensure they are skipped
* Implement XBarDictionary for managing hierarchical spans and enhance span annotator with dictionary integration
* Add hierarchical level classification for XBar labels in span annotator
* Refactor XBarDictionary and SpanAnnotatorPipeline for improved dictionary management and statistics tracking
* Enhance SpanAnnotatorPipeline to generate annotations from dictionary spans and add AnnotationValidator for validating annotations.jsonl files
* Refactor SpanAnnotatorPipeline and XBarDictionary by removing unused parameters and consolidating annotation generation logic
* Rename dictionary.jsonl to spans.jsonl in save and load methods for clarity
* Enhance SpanAnnotatorPipeline to build X-bar dictionaries and generate annotations.jsonl from working files
* Rename dictionary.jsonl to spans.jsonl in test assertions and update annotation analysis to reflect changes
* Add tests for XBar label cleaning functionality and enhance label validation in the pipeline
* Remove validate_annotations.py as it is no longer needed
* Enhance logging in SpanAnnotatorPipeline and XBarDictionary for better annotation tracking and validation. Add word span validation in XBarLabelMap to filter invalid spans.
* Enhance README and span_annotator documentation with advanced label cleaning system details, including comprehensive word span validation and intelligent logging improvements.
* Update multi-label test cases for improved label validation and structure preservation1 parent 3a9c6ec commit a0068d4
File tree
6 files changed
+244
-22
lines changed- tests/xbar
- x_spanformer
- pipelines
- xbar
6 files changed
+244
-22
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
196 | 196 | | |
197 | 197 | | |
198 | 198 | | |
| 199 | + | |
199 | 200 | | |
200 | 201 | | |
| 202 | + | |
| 203 | + | |
201 | 204 | | |
202 | 205 | | |
203 | 206 | | |
204 | 207 | | |
205 | 208 | | |
206 | | - | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
207 | 212 | | |
208 | 213 | | |
209 | 214 | | |
| |||
222 | 227 | | |
223 | 228 | | |
224 | 229 | | |
| 230 | + | |
225 | 231 | | |
226 | 232 | | |
227 | 233 | | |
228 | 234 | | |
229 | | - | |
| 235 | + | |
230 | 236 | | |
231 | 237 | | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
232 | 242 | | |
233 | 243 | | |
234 | 244 | | |
| |||
245 | 255 | | |
246 | 256 | | |
247 | 257 | | |
| 258 | + | |
248 | 259 | | |
249 | 260 | | |
250 | 261 | | |
| |||
263 | 274 | | |
264 | 275 | | |
265 | 276 | | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
266 | 281 | | |
267 | 282 | | |
268 | 283 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
72 | | - | |
73 | | - | |
| 72 | + | |
| 73 | + | |
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
87 | | - | |
| 86 | + | |
| 87 | + | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
| 14 | + | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
| |||
21 | 23 | | |
22 | 24 | | |
23 | 25 | | |
24 | | - | |
25 | | - | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
32 | | - | |
33 | | - | |
| 35 | + | |
| 36 | + | |
34 | 37 | | |
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
38 | | - | |
39 | | - | |
40 | | - | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
41 | 45 | | |
42 | 46 | | |
| 47 | + | |
| 48 | + | |
43 | 49 | | |
44 | 50 | | |
45 | 51 | | |
| |||
399 | 405 | | |
400 | 406 | | |
401 | 407 | | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
402 | 456 | | |
403 | 457 | | |
404 | 458 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
410 | 410 | | |
411 | 411 | | |
412 | 412 | | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
413 | 417 | | |
414 | 418 | | |
415 | 419 | | |
| |||
418 | 422 | | |
419 | 423 | | |
420 | 424 | | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
421 | 435 | | |
422 | 436 | | |
423 | 437 | | |
| |||
488 | 502 | | |
489 | 503 | | |
490 | 504 | | |
491 | | - | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
492 | 509 | | |
493 | 510 | | |
494 | 511 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
87 | | - | |
| 86 | + | |
88 | 87 | | |
89 | 88 | | |
90 | 89 | | |
| |||
246 | 245 | | |
247 | 246 | | |
248 | 247 | | |
| 248 | + | |
249 | 249 | | |
250 | | - | |
| 250 | + | |
| 251 | + | |
251 | 252 | | |
252 | 253 | | |
253 | 254 | | |
254 | | - | |
| 255 | + | |
| 256 | + | |
255 | 257 | | |
256 | 258 | | |
257 | 259 | | |
| 260 | + | |
258 | 261 | | |
259 | 262 | | |
260 | 263 | | |
261 | | - | |
262 | | - | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
263 | 268 | | |
264 | 269 | | |
265 | 270 | | |
| |||
0 commit comments