Skip to content

A lightweight LLM-based pipeline for few-shot classification of U.S. policy texts into industry sectors using GPT-4.

License

Notifications You must be signed in to change notification settings

Rita-Yixuan-Wang/llm_for_policy_intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Few-Shot LLM-Based Policy Text Classifier

This project demonstrates how to use GPT-4 with few-shot prompting to classify U.S. policy statements by their relevant industry sectors. It is designed to be a lightweight, no-training-required prototype for consulting, research, and regulatory analysis tasks.

OpenAI Python License Status


🚀 Overview

Goal:
Automatically assign industry labels (e.g., Energy, Finance, Education) to policy texts using GPT-4.

Method:
LLM Few-shot learning (in-context classification using prompt examples)

Tools:
Python, OpenAI API, Pandas

Dataset:
10 synthetic U.S. policy statements covering multiple federal agencies and industries


🧠 Business Value

Policy analysts, consultants, and business strategists often need to monitor large volumes of policy updates. Manual classification is time-consuming.
This project shows how LLMs can be used out-of-the-box to:

  • Identify regulatory risks and opportunities by industry
  • Route policy updates to relevant internal teams (e.g., Energy, Labor)
  • Quickly prototype NLP pipelines without labeled datasets or ML training

For example, a consulting team serving clients in the energy and manufacturing sectors could deploy this LLM system to classify hundreds of federal and state policy updates weekly. The model could auto-route energy-related content to sustainability teams, or flag manufacturing incentives to business development units. With few-shot LLMs, this capability is deployable without training data, making it ideal for time-sensitive or resource-constrained use cases.


🛠️ How It Works

  1. A small number of labeled examples are written in the prompt
  2. The model is prompted with new policy statements
  3. GPT-4 predicts the most relevant industry
  4. Results are stored in a CSV file

🧾 Example Output

Policy Snippet Predicted Industry
FAA increases drone cybersecurity standards Transportation
DOE funds small nuclear reactor projects Energy
USDA subsidizes organic agriculture Agriculture
Labor Dept proposes warehouse worker protections Labor & Employment

Accuracy: 100% in a 10-sample test set (manually reviewed)


📁 Project Structure

llm-policy-classifier/
├── data/
│   └── classified_policies.csv     # Output predictions
├── src/
│   └── classify_with_gpt.py        # Main classification script
├── reports/
│   └── final_report.pdf            # Full analysis and results
├── README.md                       # Project overview (this file)

📈 Possible Extensions

This project can be extended in several ways:

  • 🧠 Fine-tune BERT or LLaMA for more robust domain-specific classification
  • 📚 Use RAG (Retrieval-Augmented Generation) to enhance prediction quality on ambiguous or long policy texts
  • 📊 Build a Streamlit UI for interactive input/output and result visualization
  • 🧮 Scale to real datasets with 1,000+ policies from public government or regulatory sources

Releases

No releases published

Packages

No packages published