Skip to content

hyworrywart/codeforces_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graph Problem Classification Pipeline

Classifies Codeforces problems into a hierarchical graph reasoning taxonomy (7 capabilities > 19 families > 90 variants) using Claude API.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Input: CF Problems (JSON)                 │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
        ┌─────────────────────────┐
        │  Stage 0: Pre-filter    │  Rule-based tag filtering
        │  (No API call)          │  → skip / maybe_graph / graph
        └──────┬──────────┬───────┘
           skip│          │graph / maybe_graph
               ▼          ▼
            discard   ┌───────────────────────────┐
                      │ Stage 1: Family Classify   │  Sonnet → 19 families
                      │ (1 API call per problem)   │  multi-label + confidence
                      └──────┬──────────┬──────────┘
                     conf<0.5│          │conf≥0.5
                             ▼          ▼
                     ┌──────────┐  ┌──────────────────────────────┐
                     │  HUMAN   │  │ Stage 2: Variant ID           │
                     │  REVIEW  │  │ (1 API call per family)       │
                     └──────────┘  │ Per-family prompt with        │
                                   │ variant profiles              │
                                   └──┬────────────┬───────────┬───┘
                                      │            │           │
                              matched variants  new variant  low confidence
                                      │         proposal       │
                                      ▼            ▼           ▼
                              ┌────────────┐ ┌──────────┐ ┌──────────┐
                              │   AUTO     │ │   NEW    │ │  HUMAN   │
                              │  ACCEPT    │ │ VARIANT  │ │  REVIEW  │
                              └─────┬──────┘ └────┬─────┘ └──────────┘
                                    │             │
                                    ▼             ▼
                            classified.jsonl  new_variants.jsonl

Setup

pip install -e ".[dev]"

Usage

Classify problems

# Full run
python -m graph_classify --input data/train.json

# Limit to first 50 problems
python -m graph_classify --input data/train.json --limit 50

# Custom config, taxonomy, and output
python -m graph_classify --input data/train.json \
    --config config.yaml \
    --taxonomy data/taxonomy_profiles.json \
    --output-dir results/

Classification is resumable — interrupt with Ctrl+C and re-run the same command to pick up where you left off. Use --fresh-start to force a full re-run.

Review new variants

python -m graph_classify review

Interactive prompt: approve / edit name / skip each proposed variant. Approved variants are added to the taxonomy with is_auto_generated: true.

Sort review file by family

python -m graph_classify sort \
    --input-jsonl output/human_review.jsonl \
    --output-jsonl output/sorted.jsonl

Spot-check classified items

python -m graph_classify audit --input-jsonl output/classified.jsonl -n 10

Configuration

Edit config.yaml to tune models, thresholds, rate limits, and truncation lengths. CLI arguments override YAML values, which override code defaults.

Project Structure

grbench_analysis/
├── pyproject.toml
├── config.yaml
├── data/
│   ├── train.json
│   ├── test.json
│   ├── taxonomy_profiles.json
│   └── codeforces_selected_accepted.json  (gitignored)
├── graph_classify/
│   ├── __init__.py
│   ├── __main__.py          # CLI entry point
│   ├── config.py            # Config dataclasses + YAML loading
│   ├── models.py            # ClassificationResult, CheckpointState, Route
│   ├── taxonomy.py          # TaxonomyManager
│   ├── api.py               # APIClient with rate limiting + retry
│   ├── pipeline.py          # Pipeline + OutputWriter
│   ├── checkpoint.py        # Resumability via checkpoints
│   ├── review.py            # Interactive new-variant review
│   ├── utils.py             # load_problems, sort_by_family, spot_check
│   └── stages/
│       ├── prefilter.py         # Stage 0
│       ├── family_classify.py   # Stage 1
│       ├── variant_identify.py  # Stage 2
│       └── routing.py          # Stage 3
├── tests/
│   ├── conftest.py
│   ├── test_prefilter.py
│   ├── test_routing.py
│   ├── test_taxonomy.py
│   ├── test_config.py
│   └── test_models.py
└── output/                  (gitignored, generated at runtime)

Confidence Routing

Overall Confidence Route Action
>= 0.80 auto Accept, write to classified.jsonl
0.50 - 0.79 spot_check Accept but flag for random review
< 0.50 human_review Send to human_review.jsonl
any + new variant new_variant Send to new_variants.jsonl

Overall confidence = 0.4 * Stage1_confidence + 0.6 * min(Stage2_confidences)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages