- This project uses Tesseract OCR (via
pytesseract) to extract text from images and scanned PDFs. - On Debian/Ubuntu you can install it with:
sudo apt update && sudo apt install -y tesseract-ocr libtesseract-dev tesseract-ocr-eng- Verify installation with:
which tesseract && tesseract --version- You can set the
TESSERACT_CMDenvironment variable if Tesseract is installed in a custom location, otherwise the app defaults to/usr/bin/tesseract. - For local development, add a
.envfile inbc/(or set environment variables in your shell):
TESSERACT_CMD=/usr/bin/tesseract
GOOGLE_API_KEY=your_api_key_here
JWT_SECRET=your_jwt_secret_here
MONGO_URI=your_mongo_uri_here
Then install Python dependencies and run the app:
cd bc
pip install -r requirements.txt
python app.py