Skip to content

Incorrect bibliographic information extracted from OpenReview PDFs #17

@yutojubako

Description

@yutojubako

I've encountered a problem where pdf2bib is extracting incorrect bibliographic information from PDFs obtained from OpenReview. In some cases, the extracted BibTeX entries correspond to entirely different papers.

Steps to reproduce

Download a PDF from an OpenReview forum (e.g., https://openreview.net/forum?id=C0jJAbMMub)
Use pdf2bib to extract bibliographic information from the downloaded PDF
Observe that the resulting BibTeX entry does not match the paper's actual information

Expected behavior

The extracted BibTeX entry should correspond to the paper from which the PDF was obtained.
Actual behavior
The extracted BibTeX entry corresponds to a different paper. For example, when processing a PDF from the OpenReview forum mentioned above, the tool returns bibliographic information for the "Segment Anything" paper (https://arxiv.org/abs/2304.02643) instead.

Additional information

This issue appears to occur with multiple PDFs from OpenReview, not just a single instance.

Possible causes

  1. Incorrect metadata in the PDFs from OpenReview
  2. An issue with pdf2bib's parsing logic for OpenReview PDFs
  3. A problem with the online database or API that pdf2bib might be using for verification

Suggested next steps

Investigate the metadata of affected PDFs to check for anomalies
Review pdf2bib's parsing logic for OpenReview documents
Check if there are any issues with external APIs or databases used by pdf2bib

I'm happy to provide more information or specific examples if needed. Thank you for your attention to this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions