Skip to content
Change the repository type filter

All

    Repositories list

    • sdk

      Public
      Python
      MIT License
      710314Updated Mar 19, 2026Mar 19, 2026
    • chandra

      Public
      OCR model that handles complex tables, forms, handwriting with full layout.
      Python
      Apache License 2.0
      5705k245Updated Mar 18, 2026Mar 18, 2026
    • marker

      Public
      Convert PDF to markdown + JSON quickly with high accuracy
      Python
      GNU General Public License v3.0
      2.3k33k33363Updated Mar 10, 2026Mar 10, 2026
    • surya

      Public
      OCR, layout analysis, reading order, table recognition in 90+ languages
      Python
      GNU General Public License v3.0
      1.3k19k13815Updated Mar 1, 2026Mar 1, 2026
    • Scripts to run Datalab's self-service on-prem container
      Shell
      1500Updated Feb 12, 2026Feb 12, 2026
    • pykatex

      Public
      Python
      0200Updated Feb 5, 2026Feb 5, 2026
    • Python
      1100Updated Oct 2, 2025Oct 2, 2025
    • Python
      1301Updated Aug 13, 2025Aug 13, 2025
    • docext

      Public
      An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
      Python
      Apache License 2.0
      4900Updated Jun 18, 2025Jun 18, 2025
    • pdftext

      Public
      Extract structured text from pdfs quickly
      Python
      Apache License 2.0
      65673126Updated Jun 11, 2025Jun 11, 2025