PDF Text Extraction API with PyMuPDF and FastAPI

This project provides a lightweight, containerized API for extracting and cleaning text from PDF files using PyMuPDF and serving it with FastAPI.

Current Features

Upload PDFs via an HTTP endpoint and get back cleaned text
Dockerized setup

1. Build & Run (Dockerized)

./run.sh

>>>> This will build the Docker image (`pymupdf-extract`)

##  API Endpoint

Example using `curl`:

```bash
curl -X POST http://localhost:8001/extract-pdf \
  -F "file=@/path/to/your/document.pdf"
  [optional arg:] --output [Filename].zip

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
utils		utils
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
docker-compose.yml		docker-compose.yml
main.py		main.py
readme.md		readme.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Text Extraction API with PyMuPDF and FastAPI

Current Features

1. Build & Run (Dockerized)

Important Update> always use double qoutation around the "file=@/path/file.pdf"

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

hawk-digital-environments/hawki-toolkit-file-converter

Folders and files

Latest commit

History

Repository files navigation

PDF Text Extraction API with PyMuPDF and FastAPI

Current Features

1. Build & Run (Dockerized)

Important Update> always use double qoutation around the "file=@/path/file.pdf"

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages