This project is a Flask-based web application that allows users to upload PDF documents and search for specific keywords. The tool provides two search modes:
- Regular Search – Finds and displays exact keyword matches.
- Smart Search – Uses OpenAI's GPT API to determine whether the keyword appears in a context relevant to health or climate.
The system outputs:
- The pages where the keyword appears.
- The sentences containing the keyword.
- A classification of whether the sentence is contextually relevant to health or climate.
✅ Upload PDFs (Max size: 16MB)
✅ Extract and search text from PDFs
✅ Keyword matching with contextual classification using GPT-3.5 Turbo
✅ User-friendly web interface with Flask and HTML/CSS
✅ Deployment-ready (Heroku or local execution)
- Python (Flask, pdfplumber, re, dotenv, OpenAI API)
- Frontend: HTML, CSS, JavaScript
- Cloud Deployment: Heroku (or can be run locally)
git clone https://github.com/your-repo-url.git
cd your-repo-folder python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txtCreate a .env file in the project root and add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here python app.pyOnce running, open http://127.0.0.1:5000/ in your web browser.
- Upload a PDF file.
- Enter comma-separated keywords in the search box.
- Select search mode: Regular or Smart Search.
- Submit to process the document and view results.
- Found Keywords: Page numbers and matched sentences.
- Context Analysis: Indicates whether the keyword is used in a health/climate-related context.
- Download/Review: Ability to go back and upload another file.
📂 project-root
┣ 📂 templates/ # HTML Templates (upload.html, results.html)
┣ 📜 app.py # Main Flask Application
┣ 📜 requirements.txt # Dependencies
┣ 📜 .env.example # Sample environment variables file
┗ 📜 README.md # Project Documentation
🔹 Support for additional document formats (DOCX, TXT)
🔹 Multilingual support for keyword classification
🔹 Improved UI/UX with React or Vue.js
MIT License. Feel free to use and improve!
For inquiries, contact Primanta at primanta.b@columbia.edu.


