Vietnamese ID Card OCR

Overview

This project implements an Optical Character Recognition (OCR) system specifically designed to extract information from Vietnamese ID cards (both old and new formats). It utilizes a combination of deep learning models for robust detection and recognition, presented through an interactive Streamlit web application.

The system performs the following key steps:

ID Card Detection & Alignment: Detects the ID card in an input image using a YOLO model trained to find corners.
Perspective Correction: Warps the detected ID card region to obtain a top-down, rectangular view.
Orientation Correction: Uses QR code detection (if present) to ensure the card is correctly oriented.
Text Detection: Employs both YOLO and PaddleOCR's detection model (DB) to locate text regions within the aligned ID card image. Results are fused using Weighted Boxes Fusion (WBF) for improved accuracy.
Text Recognition: Uses the VietOCR library (VGG-Transformer) to recognize the Vietnamese text within the detected regions.
Information Extraction: Parses the recognized text using regular expressions and heuristics to extract key fields like ID number, name, date of birth, gender, nationality, place of origin, and place of residence.
QR Code Decoding: Detects and decodes the QR code present on newer ID cards.
User Interface: Provides a simple Streamlit interface for uploading ID card images and viewing the extracted results.

Features

Supports both old and new Vietnamese ID card formats.
Automatic perspective and orientation correction.
Robust text detection using YOLO and DB model fusion.
High-accuracy Vietnamese text recognition with VietOCR.
Structured extraction of key ID card fields.
QR code detection and decoding.
Interactive web interface powered by Streamlit.
Utilizes GPU acceleration if available (PyTorch/PaddlePaddle).

Technology Stack

Programming Language: Python 3.9+
Core Libraries:
- OpenCV (opencv-python): Image processing, perspective transform, drawing.
- PyTorch: Backend for YOLO and VietOCR models.
- Ultralytics YOLO: ID card corner detection and text detection.
- PaddlePaddle (paddlepaddle): Backend for PaddleOCR.
- PaddleOCR (paddleocr): Text detection (DB model).
- VietOCR (vietocr): Vietnamese text recognition.
- Streamlit: Web application framework.
- NumPy: Numerical operations.
- QReader (qreader): QR code detection and decoding.
- Ensemble-Boxes (ensemble-boxes): Weighted Boxes Fusion for detection results.
- Transformers (transformers): (Optional, used for text correction model)
- Levenshtein (python-Levenshtein): String similarity for text correction.
Models:
- Custom YOLOv11n: For ID card corner detection.
- Custom YOLO: For text field detection.
- PaddleOCR DB: For text field detection.
- VietOCR VGG-Transformer: For Vietnamese text recognition.
- bmd1905/vietnamese-correction-v2: (Optional) For text correction.

Project Structure (Simplified)

VnId-Card/
├── main.py                 # Main Streamlit application script
├── README.md               # This file
├── requirements.txt        # Python dependencies
├── requirements_windows.txt# Python dependencies for Windows
├── corner_detection_model/ # YOLO model for corner detection
│   └── weight/
│       └── *.pt
├── infer_model/            # PaddleOCR detection model files
│   ├── *.pdiparams
│   ├── *.pdiparams.info
│   ├── *.pdmodel
│   └── *.yml
├── dictionary/             # Dictionary files for text correction/validation
│   └── dictionaries/
│       └── hongocduc/
│           └── words.txt
└── ... (other potential utility scripts/folders)

Setup

Clone the repository:
```
git clone <repository-url>
cd VnId-Card
```

Create a virtual environment (recommended):

python -m venv venv
# Activate the environment
# Windows:
venv\Scripts\activate
# Linux/macOS:
source venv/bin/activate

Install dependencies: Choose the requirements file based on your operating system. Ensure you have the necessary build tools (like C++ compilers) installed, as some libraries might require compilation. If using CUDA for GPU acceleration, ensure you have a compatible PyTorch/PaddlePaddle version installed first, matching your CUDA toolkit version.
- For Windows:
```
pip install -r requirements_windows.txt
```
- For others:
```
pip install -r requirements.txt
```
- (Note: You might need to install PyTorch/PaddlePaddle with CUDA support separately if not included or if you need a specific version. Refer to their official websites.)
Model Files: The required model files seem to be included in the repository (corner_detection_model, infer_model, yolo_detect_text). Ensure they are correctly placed.

Usage

Activate your virtual environment (if you created one).
Run the Streamlit application:
```
streamlit run main.py
```
Open your web browser and navigate to the local URL provided by Streamlit (usually http://localhost:8501).
Upload an image of a Vietnamese ID card using the file uploader.
Click the "Process ID Card" button.
View the original image, processed image, detected text regions, and the extracted information.
Optionally, download the extracted information as a CSV file.

Potential Improvements

Refactor the information extraction logic (extract_field_info in main.py) to be less reliant on complex regex and positional assumptions. Consider using Named Entity Recognition (NER) models fine-tuned for Vietnamese ID cards for more robust extraction.
Improve error handling for edge cases (e.g., very blurry images, unusual lighting).
Containerize the application using Docker for easier deployment.
Add more comprehensive unit and integration tests.

🌟 New Features

API Endpoints

Batch Processing: Process multiple ID card images in a single request
History Management: View and search processing history with advanced filtering
Duplicate Detection: Automatic detection of previously processed ID cards
Auto Port Selection: Automatic port selection when default ports are in use

API Documentation

1. Process Single ID Card

POST /process-id-card/

Process a single Vietnamese ID card image.

2. Batch Processing

POST /process-batch/

Process multiple ID card images in one request.

Supports up to 10 images per batch
Automatic duplicate detection
Configurable processing parameters

3. View Processing History

GET /history

Get all processing history with pagination and filtering options:

Filter by date range
Filter by ID number
Filter by success status
Pagination support (1-100 items per page)

4. Search by ID Number

GET /search/{id_number}

Search for specific ID card processing results by ID number.

🚀 Quick Start

Prerequisites

Python 3.8+
MongoDB
CUDA-capable GPU (recommended)

Installation

Clone the repository:

git clone https://github.com/yourusername/Vietnamese_ID_Card_OCR.git
cd Vietnamese_ID_Card_OCR

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

export GEMINI_API_KEY="your_api_key_here"

Running the Application

Start the API server:

python -m src.api.fastapi_app

Access the API documentation:

Swagger UI: http://localhost:8080/docs
ReDoc: http://localhost:8080/redoc

📝 API Usage Examples

Process Single ID Card

import requests

url = "http://localhost:8080/process-id-card/"
files = {"file": open("id_card.jpg", "rb")}
response = requests.post(url, files=files)
print(response.json())

Batch Processing

import requests

url = "http://localhost:8080/process-batch/"
files = [
    ("files", open("id_card1.jpg", "rb")),
    ("files", open("id_card2.jpg", "rb"))
]
response = requests.post(url, files=files)
print(response.json())

View History

import requests

url = "http://localhost:8080/history"
params = {
    "page": 1,
    "page_size": 10,
    "filter": {
        "start_date": "2024-03-01T00:00:00Z",
        "end_date": "2024-03-20T23:59:59Z",
        "success_only": True
    }
}
response = requests.get(url, params=params)
print(response.json())

🔧 Configuration

The API supports various configuration options:

Processing Configuration

Confidence threshold
NMS threshold
Image enhancement
Processing method selection

Batch Processing Settings

Maximum batch size
Parallel processing option

📊 Monitoring

The API includes built-in monitoring features:

Prometheus metrics endpoint
Processing time tracking
Success/error rate monitoring
Request count tracking

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

YOLO for object detection
VietOCR for Vietnamese text recognition
Google Gemini for AI-powered information extraction

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.dvc		.dvc
archive		archive
config		config
data/dictionary/dictionaries/hongocduc		data/dictionary/dictionaries/hongocduc
deployment		deployment
docs		docs
logs		logs
models		models
monitor		monitor
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.env		.env
.gitignore		.gitignore
COMPLETE_PROJECT_GUIDE.md		COMPLETE_PROJECT_GUIDE.md
Makefile		Makefile
README.md		README.md
SIMPLIFICATION_SUMMARY.md		SIMPLIFICATION_SUMMARY.md
api_app.py		api_app.py
cleanup_database.py		cleanup_database.py
debug_alert_processor.py		debug_alert_processor.py
debug_port.py		debug_port.py
fix_errors.py		fix_errors.py
requirements.txt		requirements.txt
requirements_windows.txt		requirements_windows.txt
setup.py		setup.py
start_api_with_webhook.py		start_api_with_webhook.py
streamlit_app.py		streamlit_app.py

loingtan/Vietnamese_ID_Card_OCR

Folders and files

Latest commit

History

Repository files navigation

Vietnamese ID Card OCR

Overview

Features

Technology Stack

Project Structure (Simplified)

Setup

Usage

Potential Improvements

🌟 New Features

API Endpoints

API Documentation

1. Process Single ID Card

2. Batch Processing

3. View Processing History

4. Search by ID Number

🚀 Quick Start

Prerequisites

Installation

Running the Application

📝 API Usage Examples

Process Single ID Card

Batch Processing

View History

🔧 Configuration

Processing Configuration

Batch Processing Settings

📊 Monitoring

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages