Question
Hi everyone,
I know this isn’t directly a Pydantic-related question, but since many of you work with structured data validation and model outputs, I believe some might have faced this issue.
When I use OpenAI’s o4-mini model (or others) to extract and format a table (for example, a “large” table) into JSON,
the model sometimes skips rows or hallucinates entries.
Interestingly, ChatGPT itself handles this perfectly, but my API-based agents do not.
I’m looking for ways to make the output structurally accurate before validation with Pydantic.
🧩 Approaches I’ve seen mentioned
- Splitting the document into chunks before feeding it to the model
- Creating an agent that dynamically generates code to parse and reformat the table
- Enforcing a strict Pydantic schema with retries until valid output is returned
I haven’t tested these yet, so if anyone has experience with these methods and found them reliable and efficient,
I’d really appreciate it if you could share your code or any implementation examples.
Thanks a lot!
Additional Context
No response