DocIntel – Document Classification & Key Field Extraction

DocIntel is a Python-based tool that automatically:

Identifies the type of an uploaded document (Driving License, W2, Paystub, Passport, Flood Certificate, Others)
Uses OCR to extract key fields specific to that document type
Stores the extracted data in SQLite and shows it in a Streamlit UI

Features

Supports images & PDFs (JPG, PNG, PDF)
Uses Tesseract OCR to read text from documents
Rule-based classification of document types (W2, Passport, Driving License, Paystub, Flood Certificate, Others)
Extracts required key fields:
- Driving License – Name, DL number, DOB
- Flood Certificate – Borrower name, Customer No, Expire date
- W2 – Employee Name, EIN, Year
- Paystub – Employee Name, Employer Name, Net Pay
- Passport – Name, Passport number, Country
Stores results in a SQLite database and displays them in a table with timestamp.
Simple Streamlit web interface – upload → process → view structured data

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
uploads		uploads
Idea.png		Idea.png
Notes.txt		Notes.txt
app_frontend.py		app_frontend.py
database.py		database.py
document_processor.py		document_processor.py
documents.db		documents.db
readme.md		readme.md
requirements.txt		requirements.txt