Graph Problem Classification Pipeline

Classifies Codeforces problems into a hierarchical graph reasoning taxonomy (7 capabilities > 19 families > 90 variants) using Claude API.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Input: CF Problems (JSON)                 │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
        ┌─────────────────────────┐
        │  Stage 0: Pre-filter    │  Rule-based tag filtering
        │  (No API call)          │  → skip / maybe_graph / graph
        └──────┬──────────┬───────┘
           skip│          │graph / maybe_graph
               ▼          ▼
            discard   ┌───────────────────────────┐
                      │ Stage 1: Family Classify   │  Sonnet → 19 families
                      │ (1 API call per problem)   │  multi-label + confidence
                      └──────┬──────────┬──────────┘
                     conf<0.5│          │conf≥0.5
                             ▼          ▼
                     ┌──────────┐  ┌──────────────────────────────┐
                     │  HUMAN   │  │ Stage 2: Variant ID           │
                     │  REVIEW  │  │ (1 API call per family)       │
                     └──────────┘  │ Per-family prompt with        │
                                   │ variant profiles              │
                                   └──┬────────────┬───────────┬───┘
                                      │            │           │
                              matched variants  new variant  low confidence
                                      │         proposal       │
                                      ▼            ▼           ▼
                              ┌────────────┐ ┌──────────┐ ┌──────────┐
                              │   AUTO     │ │   NEW    │ │  HUMAN   │
                              │  ACCEPT    │ │ VARIANT  │ │  REVIEW  │
                              └─────┬──────┘ └────┬─────┘ └──────────┘
                                    │             │
                                    ▼             ▼
                            classified.jsonl  new_variants.jsonl

Setup

pip install -e ".[dev]"

Usage

Classify problems

# Full run
python -m graph_classify --input data/train.json

# Limit to first 50 problems
python -m graph_classify --input data/train.json --limit 50

# Custom config, taxonomy, and output
python -m graph_classify --input data/train.json \
    --config config.yaml \
    --taxonomy data/taxonomy_profiles.json \
    --output-dir results/

Classification is resumable — interrupt with Ctrl+C and re-run the same command to pick up where you left off. Use --fresh-start to force a full re-run.

Review new variants

python -m graph_classify review

Interactive prompt: approve / edit name / skip each proposed variant. Approved variants are added to the taxonomy with is_auto_generated: true.

Sort review file by family

python -m graph_classify sort \
    --input-jsonl output/human_review.jsonl \
    --output-jsonl output/sorted.jsonl

Spot-check classified items

python -m graph_classify audit --input-jsonl output/classified.jsonl -n 10

Configuration

Edit config.yaml to tune models, thresholds, rate limits, and truncation lengths. CLI arguments override YAML values, which override code defaults.

Project Structure

grbench_analysis/
├── pyproject.toml
├── config.yaml
├── data/
│   ├── train.json
│   ├── test.json
│   ├── taxonomy_profiles.json
│   └── codeforces_selected_accepted.json  (gitignored)
├── graph_classify/
│   ├── __init__.py
│   ├── __main__.py          # CLI entry point
│   ├── config.py            # Config dataclasses + YAML loading
│   ├── models.py            # ClassificationResult, CheckpointState, Route
│   ├── taxonomy.py          # TaxonomyManager
│   ├── api.py               # APIClient with rate limiting + retry
│   ├── pipeline.py          # Pipeline + OutputWriter
│   ├── checkpoint.py        # Resumability via checkpoints
│   ├── review.py            # Interactive new-variant review
│   ├── utils.py             # load_problems, sort_by_family, spot_check
│   └── stages/
│       ├── prefilter.py         # Stage 0
│       ├── family_classify.py   # Stage 1
│       ├── variant_identify.py  # Stage 2
│       └── routing.py          # Stage 3
├── tests/
│   ├── conftest.py
│   ├── test_prefilter.py
│   ├── test_routing.py
│   ├── test_taxonomy.py
│   ├── test_config.py
│   └── test_models.py
└── output/                  (gitignored, generated at runtime)

Confidence Routing

Overall Confidence	Route	Action
>= 0.80	`auto`	Accept, write to classified.jsonl
0.50 - 0.79	`spot_check`	Accept but flag for random review
< 0.50	`human_review`	Send to human_review.jsonl
any + new variant	`new_variant`	Send to new_variants.jsonl

Overall confidence = 0.4 * Stage1_confidence + 0.6 * min(Stage2_confidences)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Problem Classification Pipeline

Architecture

Setup

Usage

Classify problems

Review new variants

Sort review file by family

Spot-check classified items

Configuration

Project Structure

Confidence Routing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.checkpoints		.checkpoints
data		data
graph_classify		graph_classify
output		output
tests		tests
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
pipeline.py		pipeline.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Graph Problem Classification Pipeline

Architecture

Setup

Usage

Classify problems

Review new variants

Sort review file by family

Spot-check classified items

Configuration

Project Structure

Confidence Routing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages