It extracts text from an image of a store receipt and converts it into a structured JSON object using OCR (Tesseract), OpenCV, and Gemini AI.
- Python 3.x
- OpenCV
- pytesseract
- openai-agents
- Tesseract-OCR (must be installed separately and added to your system PATH)
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Install the required Python packages:
pip install -r requirements.txt
-
Install Tesseract-OCR (separately):
- Download and install from Tesseract at UB Mannheim.
- Add the Tesseract install directory (e.g.,
C:\Program Files\Tesseract-OCR) to your system PATH. - Test installation with:
tesseract --version
-
Set up your Gemini API key:
- Create a
.envfile in the project root:GEMINI_API_KEY=your_gemini_api_key_here
- Create a
-
Add a receipt image
Place your receipt image (e.g.,receipt2.png) in theraw_receiptsfolder. -
Run the script:
python main.py
-
Output:
- The extracted text will be printed to the console.
- The structured JSON will be saved to
json_receipt/receipt.json(full path will be shown after running).
main.py: Main script for image processing, OCR, and AI extraction.llm_setup.py: Gemini AI model setup and configuration.raw_receipts/: Place your input receipt images here.json_receipt/: Output folder for generated JSON files.requirements.txt: Python dependencies.readme.md: Project documentation.
- Tesseract not found:
Ensure Tesseract-OCR is installed and its path is added to your system PATH. Test withtesseract --versionin your terminal. - No internet connection:
Gemini AI requires an active internet connection. - API key errors:
Make sure your.envfile is present and contains a validGEMINI_API_KEY.