OCR API with Tesseract.js

Production-ready REST API for Optical Character Recognition (OCR) with support for Yoruba, Igbo, Hausa languages and Naira symbol (₦) recognition.

Features

🚀 Multi-language OCR: Supports English, Yoruba, Igbo, and Hausa
💰 Naira Symbol Recognition: Accurately detects and counts ₦ symbols
🖼️ Image Preprocessing: Automatic greyscale conversion, DPI normalization, and contrast enhancement
⚡ Worker Pool: Efficient Tesseract.js worker pool management with auto-scaling
🛡️ Production Ready: Rate limiting, error handling, logging, health checks, and metrics
🐳 Docker Support: Multi-stage Docker build with language pack caching
📊 Prometheus Metrics: Built-in metrics endpoint for monitoring
✅ Fully Tested: Unit and integration tests with Jest

Tech Stack

Node.js 20+ with TypeScript (strict mode)
Express.js - Web framework
Tesseract.js - OCR engine
Sharp - Image processing
Pino - Structured logging
Jest - Testing framework
Docker - Containerization

Prerequisites

Node.js 20 or higher
npm or yarn
Docker (optional, for containerized deployment)

Installation

Local Development

Clone the repository:

git clone <repository-url>
cd ocr-api

Install dependencies:

npm install

Create .env file from .env.example:

cp .env.example .env

Download Tesseract.js language packs:

mkdir -p langs
cd langs
wget https://tessdata.projectnaptha.com/4.0.0_fast/eng.traineddata.gz
wget https://tessdata.projectnaptha.com/4.0.0_fast/yor.traineddata.gz
wget https://tessdata.projectnaptha.com/4.0.0_fast/ibo.traineddata.gz
wget https://tessdata.projectnaptha.com/4.0.0_fast/hau.traineddata.gz
gunzip *.gz
cd ..

Build the project:

npm run build

Start the server:

npm start

Or for development with hot-reload:

npm run dev

The API will be available at http://localhost:3000

Docker Deployment

Build the Docker image:

npm run docker:build

Start the container:

npm run docker:up

The Dockerfile automatically downloads language packs during the build process.

Environment Variables

Variable	Description	Default
`NODE_ENV`	Environment (development/production)	`development`
`PORT`	Server port	`3000`
`WORKER_POOL_SIZE`	Number of Tesseract workers	`4`
`MAX_IMAGE_SIZE`	Maximum image size in bytes	`5242880` (5 MB)
`RATE_LIMIT_WINDOW_MS`	Rate limit window in milliseconds	`60000` (1 minute)
`RATE_LIMIT_MAX`	Maximum requests per window	`60`
`LOG_LEVEL`	Logging level (debug/info/warn/error)	`info`

API Endpoints

POST /api/v1/ocr

Perform OCR on an uploaded image.

Request:

Method: POST
Content-Type: multipart/form-data
Body:
- image (required): Image file (PNG, JPG, JPEG) - Max 5 MB
- language (optional): Language hint for better accuracy
  - Single language: eng, yor, ibo, hau
  - Multiple languages: eng+yor, yor+ibo, etc.
  - If omitted, uses all available languages

Response (200 OK):

{
  "success": true,
  "data": {
    "text": "₦5 000 for garri",
    "confidence": 92.3,
    "language": "yor",
    "nairaCount": 1
  }
}

Best Practices for High Confidence:

Specify language: Use ?language=yor for Yoruba text, ?language=hau for Hausa, etc.
Single language is better: eng gives higher confidence than eng+yor+ibo+hau
Image quality: Use high-resolution images (minimum 300 DPI), clear text, good lighting
Avoid compression: PNG is better than heavily compressed JPEG

Error Responses:

400 - Bad Request (no image, invalid image)
413 - Payload Too Large (file exceeds size limit)
415 - Unsupported Media Type (invalid file type)
429 - Too Many Requests (rate limit exceeded)
500 - Internal Server Error

Example with cURL:

# Basic OCR (auto-detect language)
curl -X POST http://localhost:3000/api/v1/ocr \
  -F "image=@receipt.jpg"

# OCR with language hint (recommended for better accuracy)
curl -X POST "http://localhost:3000/api/v1/ocr?language=yor" \
  -F "image=@receipt.jpg"

# OCR with multiple languages
curl -X POST "http://localhost:3000/api/v1/ocr?language=eng+yor" \
  -F "image=@receipt.jpg"

Example with JavaScript (fetch):

const formData = new FormData();
formData.append('image', fileInput.files[0]);

// With language hint for better accuracy
const language = 'yor'; // or 'eng', 'ibo', 'hau', 'eng+yor', etc.
const response = await fetch(`http://localhost:3000/api/v1/ocr?language=${language}`, {
  method: 'POST',
  body: formData
});

const result = await response.json();
console.log(result);
// { success: true, data: { text: "...", confidence: 95.2, language: "yor", nairaCount: 3 } }

GET /health

Health check endpoint.

Response (200 OK):

{
  "status": "ok",
  "uptime": 1234.56,
  "version": "1.0.0",
  "workers": {
    "total": 4,
    "inUse": 1,
    "available": 3
  }
}

GET /metrics

Prometheus-format metrics endpoint.

Response (200 OK):

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total 42

# HELP tesseract_workers_total Total number of Tesseract workers
# TYPE tesseract_workers_total gauge
tesseract_workers_total 4
...

Testing

Run all tests:

npm test

Run tests in watch mode:

npm run test:watch

Run tests with coverage:

npm test -- --coverage

Test Structure

tests/nairaCounter.test.ts - Unit tests for Naira symbol counting
tests/ocr.test.ts - Integration tests for OCR API endpoints

Development

Available Scripts

Script	Description
`npm run dev`	Start development server with hot-reload
`npm run build`	Compile TypeScript to JavaScript
`npm start`	Start production server
`npm test`	Run tests with coverage
`npm run lint`	Run ESLint
`npm run lint:fix`	Fix ESLint errors
`npm run format`	Format code with Prettier
`npm run docker:build`	Build Docker image
`npm run docker:up`	Start Docker container
`npm run docker:down`	Stop Docker container

Code Quality

TypeScript: Strict mode enabled
ESLint: Configured with TypeScript rules
Prettier: Code formatting
Jest: Unit and integration testing

Architecture

src/
├── config/          # Configuration (logger, Tesseract)
├── controllers/     # Request handlers
├── middleware/      # Express middleware (error handling, validation, rate limiting)
├── routes/          # API routes
├── services/        # Business logic (OCR service)
├── utils/           # Utilities (image preprocessing, Naira counter)
├── app.ts           # Express app setup
└── server.ts        # Server entry point

Image Preprocessing

The API automatically preprocesses images for optimal OCR accuracy:

Greyscale conversion - Reduces noise and improves text recognition
DPI normalization - Resizes to 300 DPI equivalent
Contrast normalization - Enhances text visibility
Sharpening - Improves edge detection

Worker Pool Management

Configurable pool size (default: 4 workers)
Automatic worker initialization
Idle timeout (30 seconds) with auto-reinitialization
Request queuing when all workers are busy

Rate Limiting

Default: 60 requests per minute per IP address
Configurable via environment variables
Returns 429 Too Many Requests when exceeded
Includes rate limit headers in responses

Monitoring

Health Check

Monitor service health via /health endpoint.

Metrics

Prometheus-format metrics available at /metrics:

HTTP request count and duration
Tesseract worker pool statistics
Request duration percentiles (p50, p95)

Troubleshooting

Language Packs Not Found

If you see errors about missing language packs:

Ensure langs/ directory exists in project root
Verify language pack files are present:
- eng.traineddata
- yor.traineddata
- ibo.traineddata
- hau.traineddata

Low OCR Confidence

For best results:

Specify the language: Use ?language=yor for Yoruba text instead of auto-detect
Use single language when possible: eng is more accurate than eng+yor+ibo+hau
Image quality matters:
- Use high-resolution images (minimum 300 DPI)
- Ensure clear, well-lit images
- Avoid heavily compressed JPEGs (use PNG when possible)
- Ensure text is not rotated or skewed
For receipts: Text should be horizontal and clearly visible
Diacritics: The API now supports Yoruba/Igbo/Hausa diacritics (ọ, ṣ, ụ, ị, ń, etc.) for better accuracy

Worker Pool Issues

Increase WORKER_POOL_SIZE for higher concurrency
Monitor worker stats via /health endpoint
Check logs for worker initialization errors

License

MIT

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

Support

For issues and questions, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR API with Tesseract.js

Features

Tech Stack

Prerequisites

Installation

Local Development

Docker Deployment

Environment Variables

API Endpoints

POST /api/v1/ocr

GET /health

GET /metrics

Testing

Test Structure

Development

Available Scripts

Code Quality

Architecture

Image Preprocessing

Worker Pool Management

Rate Limiting

Monitoring

Health Check

Metrics

Troubleshooting

Language Packs Not Found

Low OCR Confidence

Worker Pool Issues

License

Contributing

Support

ocr

About

Uh oh!

Releases

Packages

Eldo/ocr

Folders and files

Latest commit

History

Repository files navigation

OCR API with Tesseract.js

Features

Tech Stack

Prerequisites

Installation

Local Development

Docker Deployment

Environment Variables

API Endpoints

POST /api/v1/ocr

GET /health

GET /metrics

Testing

Test Structure

Development

Available Scripts

Code Quality

Architecture

Image Preprocessing

Worker Pool Management

Rate Limiting

Monitoring

Health Check

Metrics

Troubleshooting

Language Packs Not Found

Low OCR Confidence

Worker Pool Issues

License

Contributing

Support

ocr

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages