Add ARC solver evaluation pipeline and improve utilities #5

tylerbessire · 2025-09-10T21:12:01Z

Summary

export core solver components for easier access
tighten grid and heuristic utilities with explicit exports and logging
replace silent failures in test-time training with warnings
add tools/colab_eval.py to train guidance models and evaluate tasks in Kaggle/Colab

Testing

pytest -q

https://chatgpt.com/codex/tasks/task_e_68c1e879804c8322b034486b0c896827

Summary by CodeRabbit

New Features
- Exposed core solver and utilities at the package root for simpler imports.
- Added test-time training components, including adaptive scoring, a trainer, and data augmentation.
- Introduced a Colab/Kaggle-friendly script to train guidance models, evaluate the solver, and generate submissions.
Bug Fixes
- Replaced silent failures with clear warning logs across heuristics and test-time training flows.
Documentation
- Expanded package documentation to clarify the public API.
Refactor
- Standardized explicit public API exports across modules for consistent tooling and wildcard imports.

coderabbitai · 2025-09-10T21:12:10Z

Caution

Review failed

The pull request is closed.

Walkthrough

Adds explicit public API exports across core modules, introduces logging and improved exception handling, implements a new test-time training subsystem (AdaptiveScorer, TestTimeTrainer, DataAugmentation), updates package init to re-export key symbols, and adds a Colab/Kaggle evaluation script integrating training, inference, accuracy computation, and submission writing.

Changes

Cohort / File(s)	Summary of changes
Public API surface `arc_solver/__init__.py`, `arc_solver/grid.py`, `arc_solver/heuristics.py`, `arc_solver/io_utils.py`	Added module-level `__all__` declarations; `__init__` now re-exports `ARCSolver`, `load_rerun_json`, `save_submission`, `Array`. No functional changes to existing logic.
Logging and error handling `arc_solver/heuristics.py`	Introduced `logging` with module-level logger; replaced bare `except` in `score_candidate` with `except Exception as exc` and `logger.warning(...)`.
Test-time training subsystem `arc_solver/ttt.py`	Added `AdaptiveScorer`, `TestTimeTrainer`, `DataAugmentation`; implemented adaptation workflow (feature extraction, scoring, weight updates, augmentation); added logging and replaced silent exception handling; exported via `__all__`.
Colab/Kaggle evaluation workflow `tools/colab_eval.py`	New script with `train_guidance_model`, `evaluate_solver`, `main`; loads data, trains classifier, runs solver with optional ground truth, computes per-task diffs/accuracy, writes Kaggle submission; adjusts `sys.path` for imports.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant ColabEval as tools/colab_eval.py
  participant TrainGuidance as train_guidance
  participant Solver as ARCSolver
  participant Grid as arc_solver.grid
  participant IO as arc_solver.io_utils

  User->>ColabEval: main(args)
  alt Training requested
    ColabEval->>TrainGuidance: load_training_data()
    TrainGuidance-->>ColabEval: features, labels
    ColabEval->>TrainGuidance: train_classifier(epochs)
    TrainGuidance-->>ColabEval: model_path
  end
  ColabEval->>Solver: instantiate(model_path)
  loop For each task
    ColabEval->>Solver: predict(task)
    Solver-->>ColabEval: predictions
    opt Solutions provided
      ColabEval->>Grid: to_array(pred), to_array(target), eq(...)
      Grid-->>ColabEval: per-grid diffs
    end
  end
  ColabEval->>IO: save_submission(predictions, out_path)
  IO-->>ColabEval: path
  ColabEval-->>User: accuracy, submission path
  note over ColabEval,Solver: New end-to-end evaluation and submission flow

sequenceDiagram
  autonumber
  participant TTT as TestTimeTrainer
  participant Scorer as AdaptiveScorer
  participant Aug as DataAugmentation
  participant Prog as CandidatePrograms

  TTT->>Aug: augment_training_pairs(train_pairs)
  Aug-->>TTT: augmented_pairs
  TTT->>Scorer: new(feature_dim)
  loop iterations
    TTT->>Prog: iterate candidates
    Prog-->>TTT: program
    TTT->>Scorer: score_program(program, pairs)
    Scorer-->>TTT: score
    TTT->>TTT: _evaluate_program(program, pairs)
    alt success
      TTT->>Scorer: update_weights(positives, negatives, pairs)
    else failure
      TTT->>Scorer: update_weights(positives, negatives, pairs)
    end
  end
  TTT-->>Scorer: adapted weights
  note over TTT,Scorer: Iterative adaptation with logging on exceptions

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I thump out logs with careful might,
New weights adapt by learning’s light.
Exports aligned, our APIs neat—
Colab calls, submissions fleet.
Grids now whisper where they err—
A rabbit nods: “Proceed. Prefer.”
Hop, train, predict—then sign and purr. 🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a232bb and 8c98177.

📒 Files selected for processing (6)

arc_solver/__init__.py (1 hunks)
arc_solver/grid.py (1 hunks)
arc_solver/heuristics.py (2 hunks)
arc_solver/io_utils.py (1 hunks)
arc_solver/ttt.py (5 hunks)
tools/colab_eval.py (1 hunks)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch codex/update-arc-solver-files-for-sota

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Refine solver utilities and add evaluation script

8c98177

tylerbessire added the codex label Sep 10, 2025 — with ChatGPT Codex Connector

tylerbessire merged commit ea0ba32 into main Sep 10, 2025
2 of 6 checks passed

tylerbessire deleted the codex/update-arc-solver-files-for-sota branch September 10, 2025 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ARC solver evaluation pipeline and improve utilities #5

Add ARC solver evaluation pipeline and improve utilities #5

Uh oh!

tylerbessire commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 10, 2025 •

edited

Loading

Review failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add ARC solver evaluation pipeline and improve utilities #5

Add ARC solver evaluation pipeline and improve utilities #5

Uh oh!

Conversation

tylerbessire commented Sep 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tylerbessire commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 10, 2025 •

edited

Loading