Skip to content

AhmedFaizanDev/invoice-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Indian GST Invoice Extractor using Ollama LLM

A powerful, privacy-first Streamlit application for extracting structured data from Indian GST invoices using Ollama's local vision models. No API keys required, runs completely offline, and outputs GST-compliant JSON format.


Key Features

GST Compliance

  • Indian GST-compliant JSON output format
  • Proper CGST, SGST, IGST rate and amount extraction
  • HSN codes, PAN numbers, GST numbers extraction
  • Tax summary with automated calculations
  • Multiple items per invoice support

Privacy & Security

  • 100% Local Processing - No data leaves your machine
  • No API keys required - Uses local Ollama models
  • Offline capable - Works without internet connection
  • Cost-free - No per-request charges

File Format Support

  • Images: PNG, JPG, JPEG
  • Documents: PDF (first page extraction)
  • Multi-file processing: Upload multiple invoices at once

Advanced Features

  • Real-time processing with progress indicators
  • Model selection - Choose from multiple vision models
  • Data validation - Ensures GST compliance
  • Export options: JSON, CSV summary, clipboard copy
  • Error handling - Graceful failure recovery

Technologies Used

  • Ollama: Local LLM inference with vision models

  • Streamlit: Web interface

  • Vision Models: LLaVA, BakLLaVA for image understanding

  • Python Libraries:

    • ollama: Local LLM client
    • streamlit: Web interface
    • pandas: Data manipulation
    • Pillow: Image processing
    • PyMuPDF: PDF handling
    • base64: Image encoding

Installation

  1. Clone the repository:

    git clone https://github.com/your-repository/invoice-extractor.git
    cd invoice-extractor
  2. Install dependencies:

    pip install -r requirements.txt
  3. Start the Ollama server:

    ollama serve

    (Ensure no other process is using port 11434. Close existing Ollama processes if needed.)


Usage

  1. Run the Streamlit application:

    streamlit run app.py
  2. Upload a file (PNG, JPG, PDF).

  3. View extracted data in tabular format.

  4. Export the data to your preferred format (CSV, JSON).


File Processing Workflow

  1. Input Handling

    • Uploaded files are identified by type
    • Images are loaded using Pillow
    • PDFs are converted to images using fitz
  2. Data Extraction

    • Ollama vision model processes the file content
    • Extracted data is parsed and formatted into a pandas DataFrame
  3. Standardization

    • Dates are converted to YYYY-MM-DD format
    • Currency fields are formatted as INR with two decimals
    • Missing fields are replaced with None
  4. Export Options

    • Download as CSV or JSON

About

Offline GST invoice extractor using local vision LLMs (Ollama) with GST-compliant JSON/CSV output.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages