Skip to content

Eldo/ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 

Repository files navigation

OCR API with Tesseract.js

Production-ready REST API for Optical Character Recognition (OCR) with support for Yoruba, Igbo, Hausa languages and Naira symbol (₦) recognition.

Features

  • πŸš€ Multi-language OCR: Supports English, Yoruba, Igbo, and Hausa
  • πŸ’° Naira Symbol Recognition: Accurately detects and counts ₦ symbols
  • πŸ–ΌοΈ Image Preprocessing: Automatic greyscale conversion, DPI normalization, and contrast enhancement
  • ⚑ Worker Pool: Efficient Tesseract.js worker pool management with auto-scaling
  • πŸ›‘οΈ Production Ready: Rate limiting, error handling, logging, health checks, and metrics
  • 🐳 Docker Support: Multi-stage Docker build with language pack caching
  • πŸ“Š Prometheus Metrics: Built-in metrics endpoint for monitoring
  • βœ… Fully Tested: Unit and integration tests with Jest

Tech Stack

  • Node.js 20+ with TypeScript (strict mode)
  • Express.js - Web framework
  • Tesseract.js - OCR engine
  • Sharp - Image processing
  • Pino - Structured logging
  • Jest - Testing framework
  • Docker - Containerization

Prerequisites

  • Node.js 20 or higher
  • npm or yarn
  • Docker (optional, for containerized deployment)

Installation

Local Development

  1. Clone the repository:
git clone <repository-url>
cd ocr-api
  1. Install dependencies:
npm install
  1. Create .env file from .env.example:
cp .env.example .env
  1. Download Tesseract.js language packs:
mkdir -p langs
cd langs
wget https://tessdata.projectnaptha.com/4.0.0_fast/eng.traineddata.gz
wget https://tessdata.projectnaptha.com/4.0.0_fast/yor.traineddata.gz
wget https://tessdata.projectnaptha.com/4.0.0_fast/ibo.traineddata.gz
wget https://tessdata.projectnaptha.com/4.0.0_fast/hau.traineddata.gz
gunzip *.gz
cd ..
  1. Build the project:
npm run build
  1. Start the server:
npm start

Or for development with hot-reload:

npm run dev

The API will be available at http://localhost:3000

Docker Deployment

  1. Build the Docker image:
npm run docker:build
  1. Start the container:
npm run docker:up

The Dockerfile automatically downloads language packs during the build process.

Environment Variables

Variable Description Default
NODE_ENV Environment (development/production) development
PORT Server port 3000
WORKER_POOL_SIZE Number of Tesseract workers 4
MAX_IMAGE_SIZE Maximum image size in bytes 5242880 (5 MB)
RATE_LIMIT_WINDOW_MS Rate limit window in milliseconds 60000 (1 minute)
RATE_LIMIT_MAX Maximum requests per window 60
LOG_LEVEL Logging level (debug/info/warn/error) info

API Endpoints

POST /api/v1/ocr

Perform OCR on an uploaded image.

Request:

  • Method: POST
  • Content-Type: multipart/form-data
  • Body:
    • image (required): Image file (PNG, JPG, JPEG) - Max 5 MB
    • language (optional): Language hint for better accuracy
      • Single language: eng, yor, ibo, hau
      • Multiple languages: eng+yor, yor+ibo, etc.
      • If omitted, uses all available languages

Response (200 OK):

{
  "success": true,
  "data": {
    "text": "₦5 000 for garri",
    "confidence": 92.3,
    "language": "yor",
    "nairaCount": 1
  }
}

Best Practices for High Confidence:

  • Specify language: Use ?language=yor for Yoruba text, ?language=hau for Hausa, etc.
  • Single language is better: eng gives higher confidence than eng+yor+ibo+hau
  • Image quality: Use high-resolution images (minimum 300 DPI), clear text, good lighting
  • Avoid compression: PNG is better than heavily compressed JPEG

Error Responses:

  • 400 - Bad Request (no image, invalid image)
  • 413 - Payload Too Large (file exceeds size limit)
  • 415 - Unsupported Media Type (invalid file type)
  • 429 - Too Many Requests (rate limit exceeded)
  • 500 - Internal Server Error

Example with cURL:

# Basic OCR (auto-detect language)
curl -X POST http://localhost:3000/api/v1/ocr \
  -F "image=@receipt.jpg"

# OCR with language hint (recommended for better accuracy)
curl -X POST "http://localhost:3000/api/v1/ocr?language=yor" \
  -F "image=@receipt.jpg"

# OCR with multiple languages
curl -X POST "http://localhost:3000/api/v1/ocr?language=eng+yor" \
  -F "image=@receipt.jpg"

Example with JavaScript (fetch):

const formData = new FormData();
formData.append('image', fileInput.files[0]);

// With language hint for better accuracy
const language = 'yor'; // or 'eng', 'ibo', 'hau', 'eng+yor', etc.
const response = await fetch(`http://localhost:3000/api/v1/ocr?language=${language}`, {
  method: 'POST',
  body: formData
});

const result = await response.json();
console.log(result);
// { success: true, data: { text: "...", confidence: 95.2, language: "yor", nairaCount: 3 } }

GET /health

Health check endpoint.

Response (200 OK):

{
  "status": "ok",
  "uptime": 1234.56,
  "version": "1.0.0",
  "workers": {
    "total": 4,
    "inUse": 1,
    "available": 3
  }
}

GET /metrics

Prometheus-format metrics endpoint.

Response (200 OK):

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total 42

# HELP tesseract_workers_total Total number of Tesseract workers
# TYPE tesseract_workers_total gauge
tesseract_workers_total 4
...

Testing

Run all tests:

npm test

Run tests in watch mode:

npm run test:watch

Run tests with coverage:

npm test -- --coverage

Test Structure

  • tests/nairaCounter.test.ts - Unit tests for Naira symbol counting
  • tests/ocr.test.ts - Integration tests for OCR API endpoints

Development

Available Scripts

Script Description
npm run dev Start development server with hot-reload
npm run build Compile TypeScript to JavaScript
npm start Start production server
npm test Run tests with coverage
npm run lint Run ESLint
npm run lint:fix Fix ESLint errors
npm run format Format code with Prettier
npm run docker:build Build Docker image
npm run docker:up Start Docker container
npm run docker:down Stop Docker container

Code Quality

  • TypeScript: Strict mode enabled
  • ESLint: Configured with TypeScript rules
  • Prettier: Code formatting
  • Jest: Unit and integration testing

Architecture

src/
β”œβ”€β”€ config/          # Configuration (logger, Tesseract)
β”œβ”€β”€ controllers/     # Request handlers
β”œβ”€β”€ middleware/      # Express middleware (error handling, validation, rate limiting)
β”œβ”€β”€ routes/          # API routes
β”œβ”€β”€ services/        # Business logic (OCR service)
β”œβ”€β”€ utils/           # Utilities (image preprocessing, Naira counter)
β”œβ”€β”€ app.ts           # Express app setup
└── server.ts        # Server entry point

Image Preprocessing

The API automatically preprocesses images for optimal OCR accuracy:

  1. Greyscale conversion - Reduces noise and improves text recognition
  2. DPI normalization - Resizes to 300 DPI equivalent
  3. Contrast normalization - Enhances text visibility
  4. Sharpening - Improves edge detection

Worker Pool Management

  • Configurable pool size (default: 4 workers)
  • Automatic worker initialization
  • Idle timeout (30 seconds) with auto-reinitialization
  • Request queuing when all workers are busy

Rate Limiting

  • Default: 60 requests per minute per IP address
  • Configurable via environment variables
  • Returns 429 Too Many Requests when exceeded
  • Includes rate limit headers in responses

Monitoring

Health Check

Monitor service health via /health endpoint.

Metrics

Prometheus-format metrics available at /metrics:

  • HTTP request count and duration
  • Tesseract worker pool statistics
  • Request duration percentiles (p50, p95)

Troubleshooting

Language Packs Not Found

If you see errors about missing language packs:

  1. Ensure langs/ directory exists in project root
  2. Verify language pack files are present:
    • eng.traineddata
    • yor.traineddata
    • ibo.traineddata
    • hau.traineddata

Low OCR Confidence

For best results:

  1. Specify the language: Use ?language=yor for Yoruba text instead of auto-detect
  2. Use single language when possible: eng is more accurate than eng+yor+ibo+hau
  3. Image quality matters:
    • Use high-resolution images (minimum 300 DPI)
    • Ensure clear, well-lit images
    • Avoid heavily compressed JPEGs (use PNG when possible)
    • Ensure text is not rotated or skewed
  4. For receipts: Text should be horizontal and clearly visible
  5. Diacritics: The API now supports Yoruba/Igbo/Hausa diacritics (ọ, αΉ£, α»₯, α»‹, Ε„, etc.) for better accuracy

Worker Pool Issues

  • Increase WORKER_POOL_SIZE for higher concurrency
  • Monitor worker stats via /health endpoint
  • Check logs for worker initialization errors

License

MIT

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

Support

For issues and questions, please open an issue on GitHub.

ocr

About

To extract from images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published