Skip to content

UN-EOSG-Analytics/programme-budget-implications

Repository files navigation

Programme Budget Implications Explorer

A web application for exploring UN Programme Budget Implications (PBIs) with structured data extraction.

Overview

This application extracts structured data from UN PBI documents and presents it in an interactive web interface. Users can browse resolutions by year, view workflow stages, and explore financial breakdowns.

See PBI_REPORT.md for detailed background on PBIs and the data schema.

Features

  • Resolutions grouped by year (derived from GA session)
  • Sortable by resolution symbol or final amount
  • Stage badges showing workflow progress (1-4)
  • Detail sidebar with:
    • Final and draft resolution links
    • Mandate summary and operative paragraphs
    • SG follow-up reports
    • Budget sections table
    • Workflow stages with per-stage costs
    • Cost evolution between stages

Tech Stack

  • Frontend: Next.js, React, Tailwind CSS, shadcn/ui
  • Extraction: Python, OpenAI API (gpt-5-mini), Pydantic
  • Data: PostgreSQL (metadata), JSON (extractions)

Setup

1. Install dependencies

npm install

2. Configure environment

cp .env.template .env.local

Required variables:

  • DATABASE_URL - PostgreSQL connection string
  • AWS_API_URL - UN Library API endpoint
  • OPENAI_API_KEY - For extraction

3. Run the app

npm run dev

Data Pipeline

data/pbis.json (raw metadata from UN Library)
    ↓
python/extract_pbi_combined.py (LLM extraction)
    ↓
data/pbi_combined_extractions.json (structured data)
    ↓
Enriched with final resolution symbols + follow-up reports
    ↓
public/data/pbi_extractions.json (frontend data)

Extraction

Document Categories

Standard PBIs (extracted): Documents with draft resolution reference in title matching:

A/(?:C\.\d+/)?\d+/L\.\d+(?:/Rev\.\d+)?

Excluded:

  • Consolidated statements (A/80/7/Add.27)
  • ICSC/Pension reports (A/80/7/Add.19)
  • ECOSOC documents (E/2024/L.33)
  • Fifth Committee decisions (A/C.5/79/L.22)

Pre-grouping Documents

Documents are grouped by draft resolution symbol before extraction:

def extract_draft_resolution(doc):
    title = doc.get('proper_title', '')
    match = re.search(r'A/(?:C\.\d+/)?\d+/L\.\d+(?:/Rev\.\d+)?', title)
    return match.group(0) if match else None

groups = defaultdict(list)
for doc in pbis:
    draft_res = extract_draft_resolution(doc)
    if draft_res:
        groups[draft_res].append(doc)

LLM Extraction

Using OpenAI's structured output with Pydantic schema:

response = client.responses.parse(
    model="gpt-5-mini",
    input=[
        {"role": "system", "content": EXTRACTION_PROMPT},
        {"role": "user", "content": combined_document_text}
    ],
    text_format=PBIResolution  # Pydantic model
)

Run Extraction

cd python
uv run extract_pbi_combined.py

Metadata Linking

UN Library API

GET {AWS_API_URL}/dev/list
Parameters:
  - tag: MARC field to search (e.g., 191, 993__a, 500__a)
  - query: Search term
  - limit: Max results

Key MARC Fields

Field Content
191__a Document symbol
245__a Title
269__a Publication date
500__a Notes (including "pursuant to" references)
993__a Related documents
991__b Agenda item number

Finding Final Resolutions

# Search for documents where 993__a contains the draft resolution
results = search(draft_symbol, tag='993__a')
# Filter for A/RES/ symbols
final = next((r for r in results if r['191__a'][0].startswith('A/RES/')), None)

Finding Follow-up Reports

# Search notes field for "pursuant to resolution X"
results = search(res_number, tag='500__a')
reports = [r for r in results if 'pursuant' in r.get('500__a', [''])[0].lower()]

File Structure

src/
├── app/
│   └── page.tsx              # Main PBI page
├── components/
│   ├── ui/sheet.tsx          # Sidebar component
│   ├── Header.tsx            # Site header
│   └── Footer.tsx            # Site footer
├── types/
│   └── pbi.ts                # TypeScript interfaces
└── middleware.ts             # Route protection

python/
├── data_prep.py              # Fetch metadata from UN Library
├── extract_pbi_combined.py   # LLM extraction script
└── util/
    └── generate_embeddings.py

data/
├── pbis.json                 # Raw PBI metadata
└── pbi_combined_extractions.json

public/data/
└── pbi_extractions.json      # Frontend data

TypeScript Schema

interface PBIResolution {
  draft_resolution_symbol: string;
  final_resolution_symbol: string | null;
  title: string;
  session: number;
  originating_committee: string;
  mandate_summary: string;
  operative_paragraphs: OperativeParagraph[];
  affected_programmes: AffectedProgramme[];
  stages: StageExtraction[];
  cost_changes: CostChange[];
  recurrence: RecurrenceInfo | null;
  final_approved_cost: number | null;
  final_approved_posts: number;
  followup_reports: FollowupReport[];
}

interface StageExtraction {
  stage_type: string;  // main_committee_pbi, fifth_committee_pbi, acabq_report, fifth_committee_report
  document_symbol: string;
  total_cost: number | null;
  costs_by_section: SectionCost[];
  posts: PostRequirement[];
  recommendation: string | null;
}

interface SectionCost {
  section_number: string;
  section_name: string;
  costs_by_year: { year: number; amount: number }[];
}

Maintenance

npm audit          # Security vulnerabilities
npm outdated       # Outdated packages
npm run lint       # ESLint errors
npx tsc --noEmit   # TypeScript errors

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published