This project demonstrates how to use GPT-4 with few-shot prompting to classify U.S. policy statements by their relevant industry sectors. It is designed to be a lightweight, no-training-required prototype for consulting, research, and regulatory analysis tasks.
Goal:
Automatically assign industry labels (e.g., Energy, Finance, Education) to policy texts using GPT-4.
Method:
LLM Few-shot learning (in-context classification using prompt examples)
Tools:
Python, OpenAI API, Pandas
Dataset:
10 synthetic U.S. policy statements covering multiple federal agencies and industries
Policy analysts, consultants, and business strategists often need to monitor large volumes of policy updates. Manual classification is time-consuming.
This project shows how LLMs can be used out-of-the-box to:
- Identify regulatory risks and opportunities by industry
- Route policy updates to relevant internal teams (e.g., Energy, Labor)
- Quickly prototype NLP pipelines without labeled datasets or ML training
For example, a consulting team serving clients in the energy and manufacturing sectors could deploy this LLM system to classify hundreds of federal and state policy updates weekly. The model could auto-route energy-related content to sustainability teams, or flag manufacturing incentives to business development units. With few-shot LLMs, this capability is deployable without training data, making it ideal for time-sensitive or resource-constrained use cases.
- A small number of labeled examples are written in the prompt
- The model is prompted with new policy statements
- GPT-4 predicts the most relevant industry
- Results are stored in a CSV file
Policy Snippet | Predicted Industry |
---|---|
FAA increases drone cybersecurity standards | Transportation |
DOE funds small nuclear reactor projects | Energy |
USDA subsidizes organic agriculture | Agriculture |
Labor Dept proposes warehouse worker protections | Labor & Employment |
✅ Accuracy: 100% in a 10-sample test set (manually reviewed)
llm-policy-classifier/
├── data/
│ └── classified_policies.csv # Output predictions
├── src/
│ └── classify_with_gpt.py # Main classification script
├── reports/
│ └── final_report.pdf # Full analysis and results
├── README.md # Project overview (this file)
This project can be extended in several ways:
- 🧠 Fine-tune BERT or LLaMA for more robust domain-specific classification
- 📚 Use RAG (Retrieval-Augmented Generation) to enhance prediction quality on ambiguous or long policy texts
- 📊 Build a Streamlit UI for interactive input/output and result visualization
- 🧮 Scale to real datasets with 1,000+ policies from public government or regulatory sources