A powerful, privacy-first Streamlit application for extracting structured data from Indian GST invoices using Ollama's local vision models. No API keys required, runs completely offline, and outputs GST-compliant JSON format.
- Indian GST-compliant JSON output format
- Proper CGST, SGST, IGST rate and amount extraction
- HSN codes, PAN numbers, GST numbers extraction
- Tax summary with automated calculations
- Multiple items per invoice support
- 100% Local Processing - No data leaves your machine
- No API keys required - Uses local Ollama models
- Offline capable - Works without internet connection
- Cost-free - No per-request charges
- Images: PNG, JPG, JPEG
- Documents: PDF (first page extraction)
- Multi-file processing: Upload multiple invoices at once
- Real-time processing with progress indicators
- Model selection - Choose from multiple vision models
- Data validation - Ensures GST compliance
- Export options: JSON, CSV summary, clipboard copy
- Error handling - Graceful failure recovery
-
Ollama: Local LLM inference with vision models
-
Streamlit: Web interface
-
Vision Models: LLaVA, BakLLaVA for image understanding
-
Python Libraries:
ollama: Local LLM clientstreamlit: Web interfacepandas: Data manipulationPillow: Image processingPyMuPDF: PDF handlingbase64: Image encoding
-
Clone the repository:
git clone https://github.com/your-repository/invoice-extractor.git cd invoice-extractor -
Install dependencies:
pip install -r requirements.txt
-
Start the Ollama server:
ollama serve
(Ensure no other process is using port 11434. Close existing Ollama processes if needed.)
-
Run the Streamlit application:
streamlit run app.py
-
Upload a file (PNG, JPG, PDF).
-
View extracted data in tabular format.
-
Export the data to your preferred format (CSV, JSON).
-
Input Handling
- Uploaded files are identified by type
- Images are loaded using
Pillow - PDFs are converted to images using
fitz
-
Data Extraction
- Ollama vision model processes the file content
- Extracted data is parsed and formatted into a pandas DataFrame
-
Standardization
- Dates are converted to
YYYY-MM-DDformat - Currency fields are formatted as
INRwith two decimals - Missing fields are replaced with
None
- Dates are converted to
-
Export Options
- Download as CSV or JSON