Skip to content

gautamraj8044/AirlineInvoiceDataExtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

🛫 Vistara Invoice Data Extraction API

This project is a Flask-based REST API that extracts structured data from Vistara Airlines PDF invoices. It processes invoices to retrieve key financial and supplier details, then stores the output in a structured CSV format.


📌 Features

  • Accepts Vistara Airlines invoice PDFs via a POST request.
  • Extracts fields such as invoice number, GST details, company name/address, tax amounts, and more.
  • Converts the extracted information into a structured format and saves it as output.csv.
  • Designed specifically for Vistara Airlines invoices only.

🚀 Technologies Used

  • Python 3
  • Flask
  • pdfplumber
  • Pandas
  • Regular Expressions (re)

📂 API Endpoint

POST /validate

Headers:

  • Content-Type: multipart/form-data

Form Data:

  • airline: (Required) Must be Vistara
  • file: (Required) PDF file of the Vistara invoice

Response (on success):

{
  "message": "Data extracted successfully",
  "output_csv": "output.csv"
}

Response (on failure):

  • Airline name is missing
  • This API only for Vistara
  • No file part
  • Only read pdf files
  • No selected file

📄 Extracted Fields

The API extracts and stores the following fields:

  • Invoice Number and Date
  • GSTIN and Place of Supply
  • Company and Site Name
  • Supplier Address and GST Details
  • Invoice Amounts (CGST, SGST, IGST)
  • Net Taxable Value
  • Description of Services
  • SAC Code
  • And more...

🧪 How to Run Locally

  1. Clone the repository:

    git clone https://github.com/yourusername/vistara-invoice-api.git
    cd vistara-invoice-api
  2. Install dependencies:

    pip install flask pdfplumber pandas
  3. Run the Flask app:

    python app.py
  4. Test using Postman or CURL: Example using curl:

    curl -X POST http://127.0.0.1:5000/validate \
         -F "airline=Vistara" \
         -F "file=@/path/to/invoice.pdf"

🛑 Limitations

  • Currently supports only Vistara Airlines invoice format.
  • Extracts data based on regex patterns, which may break for inconsistent layouts.
  • Saves output to a static output.csv file (can be improved with unique naming or DB storage).

✅ Future Enhancements

  • Add support for multiple airline formats.
  • Improve PDF parsing robustness using layout-aware models.
  • Return extracted data as JSON along with CSV.
  • Integrate database for better data management and history tracking.

👨‍💻 Author

Gautam Raj
AI Engineer | Python Developer | Backend Engineer
LinkedIn | GitHub


About

Flask API to extract and save airline PDF invoice details to CSV.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages