This project implements an Optical Character Recognition (OCR) system specifically designed to extract information from Vietnamese ID cards (both old and new formats). It utilizes a combination of deep learning models for robust detection and recognition, presented through an interactive Streamlit web application.
The system performs the following key steps:
- ID Card Detection & Alignment: Detects the ID card in an input image using a YOLO model trained to find corners.
- Perspective Correction: Warps the detected ID card region to obtain a top-down, rectangular view.
- Orientation Correction: Uses QR code detection (if present) to ensure the card is correctly oriented.
- Text Detection: Employs both YOLO and PaddleOCR's detection model (DB) to locate text regions within the aligned ID card image. Results are fused using Weighted Boxes Fusion (WBF) for improved accuracy.
- Text Recognition: Uses the VietOCR library (VGG-Transformer) to recognize the Vietnamese text within the detected regions.
- Information Extraction: Parses the recognized text using regular expressions and heuristics to extract key fields like ID number, name, date of birth, gender, nationality, place of origin, and place of residence.
- QR Code Decoding: Detects and decodes the QR code present on newer ID cards.
- User Interface: Provides a simple Streamlit interface for uploading ID card images and viewing the extracted results.
- Supports both old and new Vietnamese ID card formats.
- Automatic perspective and orientation correction.
- Robust text detection using YOLO and DB model fusion.
- High-accuracy Vietnamese text recognition with VietOCR.
- Structured extraction of key ID card fields.
- QR code detection and decoding.
- Interactive web interface powered by Streamlit.
- Utilizes GPU acceleration if available (PyTorch/PaddlePaddle).
- Programming Language: Python 3.9+
- Core Libraries:
- OpenCV (
opencv-python
): Image processing, perspective transform, drawing. - PyTorch: Backend for YOLO and VietOCR models.
- Ultralytics YOLO: ID card corner detection and text detection.
- PaddlePaddle (
paddlepaddle
): Backend for PaddleOCR. - PaddleOCR (
paddleocr
): Text detection (DB model). - VietOCR (
vietocr
): Vietnamese text recognition. - Streamlit: Web application framework.
- NumPy: Numerical operations.
- QReader (
qreader
): QR code detection and decoding. - Ensemble-Boxes (
ensemble-boxes
): Weighted Boxes Fusion for detection results. - Transformers (
transformers
): (Optional, used for text correction model) - Levenshtein (
python-Levenshtein
): String similarity for text correction.
- OpenCV (
- Models:
- Custom YOLOv11n: For ID card corner detection.
- Custom YOLO: For text field detection.
- PaddleOCR DB: For text field detection.
- VietOCR VGG-Transformer: For Vietnamese text recognition.
bmd1905/vietnamese-correction-v2
: (Optional) For text correction.
VnId-Card/
├── main.py # Main Streamlit application script
├── README.md # This file
├── requirements.txt # Python dependencies
├── requirements_windows.txt# Python dependencies for Windows
├── corner_detection_model/ # YOLO model for corner detection
│ └── weight/
│ └── *.pt
├── infer_model/ # PaddleOCR detection model files
│ ├── *.pdiparams
│ ├── *.pdiparams.info
│ ├── *.pdmodel
│ └── *.yml
├── dictionary/ # Dictionary files for text correction/validation
│ └── dictionaries/
│ └── hongocduc/
│ └── words.txt
└── ... (other potential utility scripts/folders)
-
Clone the repository:
git clone <repository-url> cd VnId-Card
-
Create a virtual environment (recommended):
python -m venv venv # Activate the environment # Windows: venv\Scripts\activate # Linux/macOS: source venv/bin/activate
-
Install dependencies: Choose the requirements file based on your operating system. Ensure you have the necessary build tools (like C++ compilers) installed, as some libraries might require compilation. If using CUDA for GPU acceleration, ensure you have a compatible PyTorch/PaddlePaddle version installed first, matching your CUDA toolkit version.
- For Windows:
pip install -r requirements_windows.txt
- For others:
pip install -r requirements.txt
- (Note: You might need to install PyTorch/PaddlePaddle with CUDA support separately if not included or if you need a specific version. Refer to their official websites.)
- For Windows:
-
Model Files: The required model files seem to be included in the repository (
corner_detection_model
,infer_model
,yolo_detect_text
). Ensure they are correctly placed.
- Activate your virtual environment (if you created one).
- Run the Streamlit application:
streamlit run main.py
- Open your web browser and navigate to the local URL provided by Streamlit (usually
http://localhost:8501
). - Upload an image of a Vietnamese ID card using the file uploader.
- Click the "Process ID Card" button.
- View the original image, processed image, detected text regions, and the extracted information.
- Optionally, download the extracted information as a CSV file.
- Refactor the information extraction logic (
extract_field_info
inmain.py
) to be less reliant on complex regex and positional assumptions. Consider using Named Entity Recognition (NER) models fine-tuned for Vietnamese ID cards for more robust extraction. - Improve error handling for edge cases (e.g., very blurry images, unusual lighting).
- Containerize the application using Docker for easier deployment.
- Add more comprehensive unit and integration tests.
- Batch Processing: Process multiple ID card images in a single request
- History Management: View and search processing history with advanced filtering
- Duplicate Detection: Automatic detection of previously processed ID cards
- Auto Port Selection: Automatic port selection when default ports are in use
POST /process-id-card/
Process a single Vietnamese ID card image.
POST /process-batch/
Process multiple ID card images in one request.
- Supports up to 10 images per batch
- Automatic duplicate detection
- Configurable processing parameters
GET /history
Get all processing history with pagination and filtering options:
- Filter by date range
- Filter by ID number
- Filter by success status
- Pagination support (1-100 items per page)
GET /search/{id_number}
Search for specific ID card processing results by ID number.
- Python 3.8+
- MongoDB
- CUDA-capable GPU (recommended)
- Clone the repository:
git clone https://github.com/yourusername/Vietnamese_ID_Card_OCR.git
cd Vietnamese_ID_Card_OCR
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
export GEMINI_API_KEY="your_api_key_here"
- Start the API server:
python -m src.api.fastapi_app
- Access the API documentation:
- Swagger UI:
http://localhost:8080/docs
- ReDoc:
http://localhost:8080/redoc
import requests
url = "http://localhost:8080/process-id-card/"
files = {"file": open("id_card.jpg", "rb")}
response = requests.post(url, files=files)
print(response.json())
import requests
url = "http://localhost:8080/process-batch/"
files = [
("files", open("id_card1.jpg", "rb")),
("files", open("id_card2.jpg", "rb"))
]
response = requests.post(url, files=files)
print(response.json())
import requests
url = "http://localhost:8080/history"
params = {
"page": 1,
"page_size": 10,
"filter": {
"start_date": "2024-03-01T00:00:00Z",
"end_date": "2024-03-20T23:59:59Z",
"success_only": True
}
}
response = requests.get(url, params=params)
print(response.json())
The API supports various configuration options:
- Confidence threshold
- NMS threshold
- Image enhancement
- Processing method selection
- Maximum batch size
- Parallel processing option
The API includes built-in monitoring features:
- Prometheus metrics endpoint
- Processing time tracking
- Success/error rate monitoring
- Request count tracking
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- YOLO for object detection
- VietOCR for Vietnamese text recognition
- Google Gemini for AI-powered information extraction