Skip to content

Benchmarking Doc parsing #42

@yogeshhk

Description

@yogeshhk

For resumes in different layouts, file formats, do benchmarking using the following Libraries:

  • Nanonets: qwen, hugging face, no handwritten
  • Docling: IBM, many modalities, tables, ppts etc
  • Llama-OCR: pdf->Markdown, together-ai- API
  • Unstrcurured: pdf, html, word..
  • llama parse
  • etc

Decide how to evaluate? Metrics for layout & extraction?

Prep report/paper/talk

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions