GitHub - fw7th/redact: Document redaction microservice.

redact-logo

🔒 A fast, serverless, and scalable API service for automated OCR and PII redaction on document batches.

Redact exposes a FastAPI HTTPs interface, persists state in Supabase (Postgres + Storage), and offloads compute-heavy redaction workloads to Modal

Features

RESTful API: Clear endpoints for submitting and retrieving redaction tasks.
Asynchronous Processing: Long-running redaction jobs are executed asynchronously via serverless compute.
Persistent Storage: Task metadata stored in Supabase Postgres via SQLAlchemy; documents stored in Supabase Storage.
Benchmarked: Includes load testing scripts using Locust and benchmarks for performance analysis.
API: FastAPI
Database: Supabase Postgres
Object Storage: Supabase Storage
Compute: Modal (serverless execution)

Random image I found on the internet vs. Redacted.

Performance

API Benchmarking Table

Endpoint	Operation	Payload Size	Concurrent Users	Requests/sec	Avg Latency (ms)	P95 Latency (ms)	Error Rate	Notes
`POST /predict`	Create	100KB image	2	0.67	34.95	56	0%	Includes file validation, disk write, Inference offshoring
`GET /download/{id}`	GET	N/A	10	3.58	10.38	32	0%	Retrieve processed document from server
`GET /check/{id}`	Read	N/A	10	3.52	9.94	28	0%	Fetches job status from Supabase Postgres
`DELETE /drop/{id}`	Delete	N/A	10	3.47	10.43	30		Delete a batch and all related files from DB

Legend:

Payload Size: Size of file or JSON sent in the request.

Concurrent Users: Simulated users (e.g., in Locust).

Requests/sec: Throughput under load.

Latency: Time from request to response (P95 = 95th percentile).

Usage

To interact with the hosted service as a client, see the client/ directory for example request scripts and helpers.

⚠️ Local execution note
Running the API locally is intended for development and testing only.
Production workloads are designed to run on Modal’s serverless infrastructure.

Development Setup

Prerequisites

To run this project locally, you will need:

A Supabase project
- Supabase Postgres (used for task metadata)
- Supabase Storage (used for document and redaction artifacts)
A Modal account
- Modal must be configured locally (one-time setup)
Python 3.10+

Modal Authentication

Modal must be authenticated on your local machine.

If you have not configured Modal before, run:

modal token set

This will store your credentials locally (e.g. in ~/.modal.toml). Alternatively, you may provide credentials manually via environment variables:

MODAL_TOKEN_ID=...
MODAL_TOKEN_SECRET=...

Ensure the Modal app name configured in the project is unique within your Modal account.

Supabase credentials are always required. Modal credentials are optional if Modal is already configured locally. You only need to set your modal app name in the .env file.

Then from ~/redact/ proceed to run:

modal run redact/workers/modalapp.py

For a persistent app on modal run:

modal deploy redact/workers/modalapp.py

Quickstart

Clone the repo

git clone https://github.com/fw7th/redact.git
cd redact

Create and activate virtualenv (optional)

python -m venv venv
source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Copy example env and configure

Please go over the Development Setup Prerequisites section.

cp .env.example .env

Start the app

uvicorn app.main:app --reload

Running tests

pytest tests/

Optionally benchmark

./benchmark/benchmark

This project does not use Docker for deployment; serverless execution is handled entirely by Modal.

API Documentation

When the server is running locally, visit:

http://localhost:8000/docs — Swagger UI
http://localhost:8000/redoc — ReDoc

These provide interactive documentation of all available endpoints with live testing.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Citations

GLiNER medium v2.1 was pivotal in the development of the PII redaction module.

@misc{stepanov2024glinermultitaskgeneralistlightweight,
      title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks}, 
      author={Ihor Stepanov and Mykhailo Shtopko},
      year={2024},
      eprint={2406.12925},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.12925}, 
}

Tesseract v5.5.1 w/ py-tesseract, the Open Source OCR engine, performs text extraction in this project.

@Manual{,
  title = {tesseract: Open Source OCR Engine},
  author = {Jeroen Ooms},
  year = {2026},
  note = {R package version 5.2.5},
  url = {https://docs.ropensci.org/tesseract/},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Performance

API Benchmarking Table

Usage

Development Setup

Prerequisites

Modal Authentication

Quickstart

Clone the repo

Create and activate virtualenv (optional)

Install dependencies

Copy example env and configure

Start the app

Running tests

Optionally benchmark

API Documentation

License

Citations

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github		.github
app		app
assets		assets
benchmarks		benchmarks
client		client
redact		redact
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
modal-requirements.txt		modal-requirements.txt
requirements.txt		requirements.txt
v2		v2

License

fw7th/redact

Folders and files

Latest commit

History

Repository files navigation

Features

Performance

API Benchmarking Table

Usage

Development Setup

Prerequisites

Modal Authentication

Quickstart

Clone the repo

Create and activate virtualenv (optional)

Install dependencies

Copy example env and configure

Start the app

Running tests

Optionally benchmark

API Documentation

License

Citations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages