Unexpected References Appearing in Docling OCR JSON Results #1977
Unanswered
manikrishna-m
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Everyone,
I'm using Docling's basic OCR to convert a PDF into text, and I save the output as a JSON file. In one of the PDFs I'm processing, the JSON result includes a group (Group 4), which lists its children as references to texts 61 through 77. Here's a simplified snippet of what the raw JSON looks like:
However, after loading this JSON using:
The children list in Group 4 unexpectedly changes. Here's what I get:
As you can see, there are now references to #/groups/44 and #/groups/45, which were not present in the original JSON. Also, texts/76 and texts/77 are missing from the parsed result.
Can anyone help me understand why this is happening? Is it a parsing issue with model_validate, or could the input JSON be getting altered during validation?
Happy to provide more details if needed.
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions