-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
In the current matching strategy, a point on a polyline is associated with the smallest bounding box that contains it.
docling-eval/docling_eval/dataset_builders/cvat_dataset_builder.py
Lines 225 to 230 in b507977
| if box["l"] <= point[0] <= box["r"] and box["t"] <= point[1] <= box["b"]: | |
| current_area = (box["r"] - box["l"]) * (box["b"] - box["t"]) | |
| if index == -1 or current_area < area: | |
| area = current_area | |
| index = i | |
| box_result = box |
This approach works for certain link types, such as
to_footnote, to_value, and to_caption. However, for links like reading_order, merge, or group, we expect the points to be associated with the outermost bounding boxes under certain conditions. For example, in the case of a table, the reading_order should be attached to the table's bounding box, not to the bounding box of an individual table_row.
To ensure the validation methods defined in PR #102 work as intended, the find_box function needs to be updated accordingly.
Metadata
Metadata
Assignees
Labels
No labels