Skip to content
Change the repository type filter

All

    Repositories list

    • unstructured-js-client

      Public
      A JavaScript/Typescript client for the Unstructured Platform API
      TypeScript
      MIT License
      155861Updated Mar 10, 2026Mar 10, 2026
    • unstructured-python-client

      Public
      A Python client for the Unstructured Platform API
      Python
      MIT License
      20114142Updated Mar 10, 2026Mar 10, 2026
    • docs

      Public
      Documentation for all Unstructured products and libraries
      MDX
      257015Updated Mar 9, 2026Mar 9, 2026
    • unstructured-ingest

      Public
      HTML
      Apache License 2.0
      571056131Updated Mar 9, 2026Mar 9, 2026
    • unstructured-api

      Public
      Python
      Apache License 2.0
      1878853613Updated Mar 7, 2026Mar 7, 2026
    • unstructured

      Public
      Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats fo…
      HTML
      Apache License 2.0
      1.2k14k18059Updated Mar 4, 2026Mar 4, 2026
    • unstructured-platform-plugins

      Public
      Python
      Apache License 2.0
      3602Updated Mar 3, 2026Mar 3, 2026
    • unstructured-inference

      Public
      Python
      Apache License 2.0
      752062524Updated Mar 1, 2026Mar 1, 2026
    • UNS-MCP

      Public
      Jupyter Notebook
      224232Updated Feb 25, 2026Feb 25, 2026
    • notebooks

      Public
      Jupyter Notebook
      0200Updated Jan 29, 2026Jan 29, 2026
    • base-images

      Public
      Store Dockerfiles and Packer configs for images to use as a base to build upon
      Shell
      Apache License 2.0
      3612Updated Jan 14, 2026Jan 14, 2026
    • Python
      Apache License 2.0
      3600Updated Dec 2, 2025Dec 2, 2025
    • Jupyter Notebook
      0000Updated Oct 6, 2025Oct 6, 2025
    • rag-over-hybrid-data-sources

      Public
      Two sources (S3, ElasticSearch) to RAG DB pipeline.
      Jupyter Notebook
      1101Updated Sep 15, 2025Sep 15, 2025
    • .github

      Public
      2021Updated Aug 20, 2025Aug 20, 2025
    • HTML
      1800Updated Jul 23, 2025Jul 23, 2025
    • Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and …
      Python
      Apache License 2.0
      9.9k4100Updated Mar 17, 2025Mar 17, 2025
    • A Python wrapper for Google Tesseract
      Python
      Apache License 2.0
      750400Updated Mar 5, 2025Mar 5, 2025
    • Reference architecture that provides a set of guidelines and best practices for implementing a central AI API gateway to empower various line-of-business units …
      Bicep
      MIT License
      141100Updated Nov 22, 2024Nov 22, 2024
    • Script to accompany the AWS blog post on unstructured data ETL with Unstructured Ingest library
      Python
      Apache License 2.0
      0000Updated Oct 16, 2024Oct 16, 2024
    • Pairing Technical Challenge
      TypeScript
      0000Updated Sep 4, 2024Sep 4, 2024
    • FedRAMP formatted model cards
      0100Updated Aug 29, 2024Aug 29, 2024
    • danswer

      Public
      Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
      Python
      Other
      2.4k1101Updated Aug 23, 2024Aug 23, 2024
    • JS Client Batch Processing
      JavaScript
      0000Updated Jul 31, 2024Jul 31, 2024
    • Main package repository for production Wolfi images
      C
      Other
      424000Updated Jul 10, 2024Jul 10, 2024
    • pipeline-sec-filings

      Public archive
      Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
      Jupyter Notebook
      Apache License 2.0
      3514857Updated Jan 1, 2024Jan 1, 2024
    • Python
      Apache License 2.0
      8804Updated Oct 2, 2023Oct 2, 2023
    • Pipeline for extraction information from Army OERs
      Jupyter Notebook
      Apache License 2.0
      5816Updated Oct 1, 2023Oct 1, 2023
    • pipeline-paddleocr

      Public
      Pipeline for converting PDFs to raw text with PaddleOCR
      Jupyter Notebook
      Apache License 2.0
      72315Updated Aug 21, 2023Aug 21, 2023
    • langchain

      Public
      ⚡ Building applications with LLMs through composability ⚡
      Python
      MIT License
      21k800Updated Aug 18, 2023Aug 18, 2023