🪪 OCR & Face Verification Pipeline (Project README)

This repository contains our exploration and development journey in building an OCR and face verification pipeline for ID card processing. The goal was to extract text from ID cards and verify whether the holder's face matches the face on the ID.

While the final implementation shifted from our initial plan, this README documents the process, decisions, and lessons learned throughout development.

📌 Project Overview

Our initial idea was to:

Perform OCR on ID cards using a Python-based script.
Extract relevant information such as name, ID number, and birth date.
Perform face verification to ensure the cardholder matches the face on the ID.

During development, we experimented with various tools, frameworks, and models before ultimately deciding not to use Tesseract OCR or DeepFace in the final version. This README captures that journey.

🔍 OCR Attempts & Findings

1. Tesseract OCR Exploration

We initially explored Google Tesseract, a widely used OCR engine.
We used the pytesseract wrapper for integration in Python.
Outcome: The text extraction quality for ID cards was below our expectations.
After consulting with peers experienced in Machine Learning, we realized:
Tesseract struggles with low-resolution ID cards.
OCR performance is sensitive to lighting, fonts, and ID card layouts.
For production-quality OCR, a custom-trained OCR model would be more reliable.

2. Future Development Direction

Although time constraints prevented us from building a custom OCR model, we plan to:

Train a specialized OCR model tailored to ID card layouts.
Explore modern deep learning OCR frameworks (e.g., CRNN, TrOCR, PaddleOCR).

🧑‍🤝‍🧑 Face Verification Attempts & Findings

1. DeepFace Integration Attempt

We explored DeepFace, an easy-to-use Python library for face verification.

Can be installed directly with pip install deepface.
Supports multiple models: VGG-Face, FaceNet, ArcFace, etc.
Automatically downloads model weights when switching models.

2. Model Observation

Through testing, we found:

FaceNet performed significantly better on ID card photos
VGG-Face (DeepFace default model) was inconsistent for ID card images

3. Final Decision

Despite promising results, we ultimately chose not to use DeepFace due to:

Time limitations for fine-tuning verification thresholds
Model inconsistency with certain ID formats
Need for a more tailored and robust face matching solution

❗ Final Implementation Note

In the final version of the project, we did not use:

Tesseract OCR
DeepFace

Instead, the implemented solution focuses on the aspects that aligned best with our time constraints and performance requirements.

This documentation remains as a reference for our experimentation journey and as a guide for future development.

🔧 Requirements (If Revisiting the Approach)

If you plan to explore the original tools we tested:

Tesseract OCR

Install Google Tesseract (OS-specific installers)
Add Tesseract executable to your system PATH
Install Python wrapper:

pip install pytesseract

DeepFace

pip install deepface

Models will be downloaded automatically when selected.

📈 Future Improvements

Build a dedicated OCR model specialized for Indonesian ID cards
Implement a lightweight face verification model optimized for real-time inference
Improve preprocessing pipeline for image enhancement
Add automated confidence scoring for ID verification

📄 Conclusion

This project taught us the importance of: While we pivoted from our initial plan, the lessons learned will directly inform future iterations of this product.

reference: https://towardsdatascience.com/googles-tesseract-ocr-how-good-is-it-on-documents-d71d4bf7640 https://pypi.org/project/deepface/ https://arxiv.org/pdf/2101.05214.pdf https://medium.com/analytics-vidhya/optical-character-recognition-using-tensorflow-533061285dd3

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ktp		ktp
muka		muka
verifktp.json		verifktp.json
weight verifmuka		weight verifmuka
OCR NIK Extractor and face verification.ipynb		OCR NIK Extractor and face verification.ipynb
OCR and Id verification.ipynb		OCR and Id verification.ipynb
README.md		README.md
model.tflite		model.tflite
verifmuka.ipynb		verifmuka.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🪪 OCR & Face Verification Pipeline (Project README)

📌 Project Overview

🔍 OCR Attempts & Findings

1. Tesseract OCR Exploration

2. Future Development Direction

🧑‍🤝‍🧑 Face Verification Attempts & Findings

1. DeepFace Integration Attempt

2. Model Observation

3. Final Decision

❗ Final Implementation Note

🔧 Requirements (If Revisiting the Approach)

Tesseract OCR

DeepFace

📈 Future Improvements

📄 Conclusion

About

Uh oh!

Releases

Packages

Languages

celeebi/ML-Bangkit-Project

Folders and files

Latest commit

History

Repository files navigation

🪪 OCR & Face Verification Pipeline (Project README)

📌 Project Overview

🔍 OCR Attempts & Findings

1. Tesseract OCR Exploration

2. Future Development Direction

🧑‍🤝‍🧑 Face Verification Attempts & Findings

1. DeepFace Integration Attempt

2. Model Observation

3. Final Decision

❗ Final Implementation Note

🔧 Requirements (If Revisiting the Approach)

Tesseract OCR

DeepFace

📈 Future Improvements

📄 Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages