📄 DocuFlow

DocuFlow is an AI-based document processing and Excel automation system that converts user-uploaded documents into clean, structured Excel records through a deterministic and explainable backend pipeline.

The system is designed to be mentor-safe, academically defensible, and practically useful, focusing on real-world document workflows rather than black-box automation.

✨ Key Features

📂 Single & multi-file document upload
🧭 User-selected document type (Receipts, Invoices, ID Cards)
🖼️ Image preprocessing using OpenCV
🔤 OCR with spatial text extraction (Tesseract)
📐 Layout-aware text grouping & analysis
🧠 Rule-based field extraction (deterministic)
✅ Validation & confidence scoring
📦 Batch processing with per-file error isolation
📊 Automatic Excel (.xlsx) generation
⬇️ Secure Excel download endpoint
🖥️ Clean frontend with progressive disclosure UI

🧠 System Philosophy

DocuFlow is intentionally designed to be:

Deterministic – no black-box ML decisions
Explainable – every processing step is inspectable
Modular – each pipeline stage is independent
Realistic – mirrors real-world document processing systems

This makes the system ideal for academic evaluation, project demos, and interview discussions.

🏗️ High-Level Architecture

Upload → Preprocessing → OCR → Layout Analysis
     → Field Extraction → Validation → Excel Generation
     → API Response → Frontend Download

📁 Project Structure

docuflow-backend/
├── app/
│   ├── main.py
│   ├── api/
│   │   ├── routes.py
│   │   ├── upload.py
│   │   └── download.py
│   └── core/
│       ├── storage.py
│       ├── preprocessing.py
│       ├── ocr.py
│       ├── layout.py
│       ├── extraction.py
│       ├── validation.py
│       └── excel.py
├── uploads/
├── processed/
├── exports/
├── index.html   # Frontend
├── requirements.txt
└── README.md

⚙️ Tech Stack

Backend

FastAPI – API framework
OpenCV – image preprocessing
Tesseract OCR – text extraction
Pillow – image handling
openpyxl – Excel generation

Frontend

HTML, CSS, Vanilla JavaScript
Single-page application (no framework)

🚀 Getting Started

1️⃣ Clone the Repository

git clone https://github.com/your-username/docuflow.git
cd docuflow-backend

2️⃣ Create Virtual Environment

python3 -m venv venv
source venv/bin/activate

3️⃣ Install Dependencies

python3 -m pip install -r requirements.txt

4️⃣ Install Tesseract OCR

macOS

brew install tesseract

Verify:

tesseract --version

▶️ Run the Backend

python3 -m uvicorn app.main:app --reload

Access:

API Docs: http://127.0.0.1:8000/docs

🖥️ Run the Frontend

Open index.html directly in your browser
Ensure backend is running at http://127.0.0.1:8000

📤 Usage Flow

Select document type (Receipt / Invoice / ID Card)
Upload one or more documents
Click Process Documents
View processing summary and warnings
Download generated Excel file

Each document corresponds to one row in the Excel output.

📊 Output Example (Excel)

Date	Merchant	Amount	Confidence	Warnings
05-01-2025	Swiggy	326	0.85	—

🧪 Batch Processing Behavior

Each file is processed independently
Failure in one document does not stop others
Partial extraction is allowed with warnings
Excel is generated from successfully processed files only

📌 One-Line Summary

DocuFlow processes user-selected document types through a structured pipeline involving image preprocessing, OCR, layout analysis, validation, and automatic Excel generation.

👤 Author

Developed as an academic and practical project for demonstrating document intelligence pipelines.

If you’re reviewing this project:

Check /docs for API clarity
Upload sample documents
Inspect generated Excel outputs

Thank you for exploring DocuFlow 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 DocuFlow

✨ Key Features

🧠 System Philosophy

🏗️ High-Level Architecture

📁 Project Structure

⚙️ Tech Stack

Backend

Frontend

🚀 Getting Started

1️⃣ Clone the Repository

2️⃣ Create Virtual Environment

3️⃣ Install Dependencies

4️⃣ Install Tesseract OCR

▶️ Run the Backend

🖥️ Run the Frontend

📤 Usage Flow

📊 Output Example (Excel)

🧪 Batch Processing Behavior

📌 One-Line Summary

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

shlokareddy1102/docuflow

Folders and files

Latest commit

History

Repository files navigation

📄 DocuFlow

✨ Key Features

🧠 System Philosophy

🏗️ High-Level Architecture

📁 Project Structure

⚙️ Tech Stack

Backend

Frontend

🚀 Getting Started

1️⃣ Clone the Repository

2️⃣ Create Virtual Environment

3️⃣ Install Dependencies

4️⃣ Install Tesseract OCR

▶️ Run the Backend

🖥️ Run the Frontend

📤 Usage Flow

📊 Output Example (Excel)

🧪 Batch Processing Behavior

📌 One-Line Summary

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages