AI-Powered PDF Keyword Search & Classification Tool

Overview

This project is a Flask-based web application that allows users to upload PDF documents and search for specific keywords. The tool provides two search modes:

Regular Search – Finds and displays exact keyword matches.
Smart Search – Uses OpenAI's GPT API to determine whether the keyword appears in a context relevant to health or climate.

The system outputs:

The pages where the keyword appears.
The sentences containing the keyword.
A classification of whether the sentence is contextually relevant to health or climate.

Features

✅ Upload PDFs (Max size: 16MB)
✅ Extract and search text from PDFs
✅ Keyword matching with contextual classification using GPT-3.5 Turbo
✅ User-friendly web interface with Flask and HTML/CSS
✅ Deployment-ready (Heroku or local execution)

Tech Stack

Python (Flask, pdfplumber, re, dotenv, OpenAI API)
Frontend: HTML, CSS, JavaScript
Cloud Deployment: Heroku (or can be run locally)

Installation & Setup

1️⃣ Clone the Repository

 git clone https://github.com/your-repo-url.git
 cd your-repo-folder

2️⃣ Create a Virtual Environment (Optional but Recommended)

 python3 -m venv venv
 source venv/bin/activate  # On Windows: venv\Scripts\activate

3️⃣ Install Dependencies

 pip install -r requirements.txt

4️⃣ Set Up OpenAI API Key

Create a .env file in the project root and add your OpenAI API key:

 OPENAI_API_KEY=your_openai_api_key_here

5️⃣ Run the Application

 python app.py

6️⃣ Open in Browser

Once running, open http://127.0.0.1:5000/ in your web browser.

Usage

Uploading & Searching:

Upload a PDF file.
Enter comma-separated keywords in the search box.
Select search mode: Regular or Smart Search.
Submit to process the document and view results.

Results Page Includes:

Found Keywords: Page numbers and matched sentences.
Context Analysis: Indicates whether the keyword is used in a health/climate-related context.
Download/Review: Ability to go back and upload another file.

File Structure

📂 project-root
 ┣ 📂 templates/          # HTML Templates (upload.html, results.html)
 ┣ 📜 app.py              # Main Flask Application
 ┣ 📜 requirements.txt    # Dependencies
 ┣ 📜 .env.example        # Sample environment variables file
 ┗ 📜 README.md           # Project Documentation

Future Improvements

🔹 Support for additional document formats (DOCX, TXT)
🔹 Multilingual support for keyword classification
🔹 Improved UI/UX with React or Vue.js

License

MIT License. Feel free to use and improve!

Contact

For inquiries, contact Primanta at primanta.b@columbia.edu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered PDF Keyword Search & Classification Tool

Overview

Features

Tech Stack

Installation & Setup

1️⃣ Clone the Repository

2️⃣ Create a Virtual Environment (Optional but Recommended)

3️⃣ Install Dependencies

4️⃣ Set Up OpenAI API Key

5️⃣ Run the Application

6️⃣ Open in Browser

Usage

Uploading & Searching:

Results Page Includes:

File Structure

Future Improvements

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
images		images
template		template
Procfile		Procfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
templates		templates

Folders and files

Latest commit

History

Repository files navigation

AI-Powered PDF Keyword Search & Classification Tool

Overview

Features

Tech Stack

Installation & Setup

1️⃣ Clone the Repository

2️⃣ Create a Virtual Environment (Optional but Recommended)

3️⃣ Install Dependencies

4️⃣ Set Up OpenAI API Key

5️⃣ Run the Application

6️⃣ Open in Browser

Usage

Uploading & Searching:

Results Page Includes:

File Structure

Future Improvements

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages