Skip to content

hawk-digital-environments/hawki-toolkit-file-converter

Repository files navigation

PDF Text Extraction API with PyMuPDF and FastAPI

This project provides a lightweight, containerized API for extracting and cleaning text from PDF files using PyMuPDF and serving it with FastAPI.

Current Features

  • Upload PDFs via an HTTP endpoint and get back cleaned text
  • Dockerized setup

1. Build & Run (Dockerized)

./run.sh

>>>> This will build the Docker image (`pymupdf-extract`)

##  API Endpoint

Example using `curl`:

```bash
curl -X POST http://localhost:8001/extract-pdf \
  -F "file=@/path/to/your/document.pdf"
  [optional arg:] --output [Filename].zip

Important Update> always use double qoutation around the "file=@/path/file.pdf"

About

prepares and converts pdf files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •