This project is a Flask-based REST API that extracts structured data from Vistara Airlines PDF invoices. It processes invoices to retrieve key financial and supplier details, then stores the output in a structured CSV format.
- Accepts Vistara Airlines invoice PDFs via a POST request.
- Extracts fields such as invoice number, GST details, company name/address, tax amounts, and more.
- Converts the extracted information into a structured format and saves it as
output.csv. - Designed specifically for Vistara Airlines invoices only.
- Python 3
- Flask
- pdfplumber
- Pandas
- Regular Expressions (re)
Headers:
Content-Type: multipart/form-data
Form Data:
airline: (Required) Must beVistarafile: (Required) PDF file of the Vistara invoice
Response (on success):
{
"message": "Data extracted successfully",
"output_csv": "output.csv"
}Response (on failure):
Airline name is missingThis API only for VistaraNo file partOnly read pdf filesNo selected file
The API extracts and stores the following fields:
- Invoice Number and Date
- GSTIN and Place of Supply
- Company and Site Name
- Supplier Address and GST Details
- Invoice Amounts (CGST, SGST, IGST)
- Net Taxable Value
- Description of Services
- SAC Code
- And more...
-
Clone the repository:
git clone https://github.com/yourusername/vistara-invoice-api.git cd vistara-invoice-api -
Install dependencies:
pip install flask pdfplumber pandas
-
Run the Flask app:
python app.py
-
Test using Postman or CURL: Example using
curl:curl -X POST http://127.0.0.1:5000/validate \ -F "airline=Vistara" \ -F "file=@/path/to/invoice.pdf"
- Currently supports only Vistara Airlines invoice format.
- Extracts data based on regex patterns, which may break for inconsistent layouts.
- Saves output to a static
output.csvfile (can be improved with unique naming or DB storage).
- Add support for multiple airline formats.
- Improve PDF parsing robustness using layout-aware models.
- Return extracted data as JSON along with CSV.
- Integrate database for better data management and history tracking.
Gautam Raj
AI Engineer | Python Developer | Backend Engineer
LinkedIn | GitHub