A web application for exploring UN Programme Budget Implications (PBIs) with structured data extraction.
This application extracts structured data from UN PBI documents and presents it in an interactive web interface. Users can browse resolutions by year, view workflow stages, and explore financial breakdowns.
See PBI_REPORT.md for detailed background on PBIs and the data schema.
- Resolutions grouped by year (derived from GA session)
- Sortable by resolution symbol or final amount
- Stage badges showing workflow progress (1-4)
- Detail sidebar with:
- Final and draft resolution links
- Mandate summary and operative paragraphs
- SG follow-up reports
- Budget sections table
- Workflow stages with per-stage costs
- Cost evolution between stages
- Frontend: Next.js, React, Tailwind CSS, shadcn/ui
- Extraction: Python, OpenAI API (gpt-5-mini), Pydantic
- Data: PostgreSQL (metadata), JSON (extractions)
npm installcp .env.template .env.localRequired variables:
DATABASE_URL- PostgreSQL connection stringAWS_API_URL- UN Library API endpointOPENAI_API_KEY- For extraction
npm run devdata/pbis.json (raw metadata from UN Library)
↓
python/extract_pbi_combined.py (LLM extraction)
↓
data/pbi_combined_extractions.json (structured data)
↓
Enriched with final resolution symbols + follow-up reports
↓
public/data/pbi_extractions.json (frontend data)
Standard PBIs (extracted): Documents with draft resolution reference in title matching:
A/(?:C\.\d+/)?\d+/L\.\d+(?:/Rev\.\d+)?Excluded:
- Consolidated statements (A/80/7/Add.27)
- ICSC/Pension reports (A/80/7/Add.19)
- ECOSOC documents (E/2024/L.33)
- Fifth Committee decisions (A/C.5/79/L.22)
Documents are grouped by draft resolution symbol before extraction:
def extract_draft_resolution(doc):
title = doc.get('proper_title', '')
match = re.search(r'A/(?:C\.\d+/)?\d+/L\.\d+(?:/Rev\.\d+)?', title)
return match.group(0) if match else None
groups = defaultdict(list)
for doc in pbis:
draft_res = extract_draft_resolution(doc)
if draft_res:
groups[draft_res].append(doc)Using OpenAI's structured output with Pydantic schema:
response = client.responses.parse(
model="gpt-5-mini",
input=[
{"role": "system", "content": EXTRACTION_PROMPT},
{"role": "user", "content": combined_document_text}
],
text_format=PBIResolution # Pydantic model
)cd python
uv run extract_pbi_combined.pyGET {AWS_API_URL}/dev/list
Parameters:
- tag: MARC field to search (e.g., 191, 993__a, 500__a)
- query: Search term
- limit: Max results
| Field | Content |
|---|---|
191__a |
Document symbol |
245__a |
Title |
269__a |
Publication date |
500__a |
Notes (including "pursuant to" references) |
993__a |
Related documents |
991__b |
Agenda item number |
# Search for documents where 993__a contains the draft resolution
results = search(draft_symbol, tag='993__a')
# Filter for A/RES/ symbols
final = next((r for r in results if r['191__a'][0].startswith('A/RES/')), None)# Search notes field for "pursuant to resolution X"
results = search(res_number, tag='500__a')
reports = [r for r in results if 'pursuant' in r.get('500__a', [''])[0].lower()]src/
├── app/
│ └── page.tsx # Main PBI page
├── components/
│ ├── ui/sheet.tsx # Sidebar component
│ ├── Header.tsx # Site header
│ └── Footer.tsx # Site footer
├── types/
│ └── pbi.ts # TypeScript interfaces
└── middleware.ts # Route protection
python/
├── data_prep.py # Fetch metadata from UN Library
├── extract_pbi_combined.py # LLM extraction script
└── util/
└── generate_embeddings.py
data/
├── pbis.json # Raw PBI metadata
└── pbi_combined_extractions.json
public/data/
└── pbi_extractions.json # Frontend data
interface PBIResolution {
draft_resolution_symbol: string;
final_resolution_symbol: string | null;
title: string;
session: number;
originating_committee: string;
mandate_summary: string;
operative_paragraphs: OperativeParagraph[];
affected_programmes: AffectedProgramme[];
stages: StageExtraction[];
cost_changes: CostChange[];
recurrence: RecurrenceInfo | null;
final_approved_cost: number | null;
final_approved_posts: number;
followup_reports: FollowupReport[];
}
interface StageExtraction {
stage_type: string; // main_committee_pbi, fifth_committee_pbi, acabq_report, fifth_committee_report
document_symbol: string;
total_cost: number | null;
costs_by_section: SectionCost[];
posts: PostRequirement[];
recommendation: string | null;
}
interface SectionCost {
section_number: string;
section_name: string;
costs_by_year: { year: number; amount: number }[];
}npm audit # Security vulnerabilities
npm outdated # Outdated packages
npm run lint # ESLint errors
npx tsc --noEmit # TypeScript errors