This project provides a lightweight, containerized API for extracting and cleaning text from PDF files using PyMuPDF
and serving it with FastAPI.
- Upload PDFs via an HTTP endpoint and get back cleaned text
- Dockerized setup
./run.sh
>>>> This will build the Docker image (`pymupdf-extract`)
## API Endpoint
Example using `curl`:
```bash
curl -X POST http://localhost:8001/extract-pdf \
-F "file=@/path/to/your/document.pdf"
[optional arg:] --output [Filename].zip