Skip to content

bug: PDF parsing is incorrect #377

@DRMPN

Description

@DRMPN

Description

Parsing this pdf https://arxiv.org/pdf/2504.11688 results in when validating doc or paper:

{'abstract': '', 'experiments': '', 'results': ''}

The reason is that some content form PDF is missing or not being found during parsing process.

In the debug logs it can be seen, that abstract section is missing before the request to LLM to find it is made.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions