-
Notifications
You must be signed in to change notification settings - Fork 0
fix: ensure solver returns predictions #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: ensure solver returns predictions #7
Conversation
WalkthroughRefactors ARCSolver to compute predictions per test input via a new _get_predictions helper with enhanced→validation→baseline fallback. Updates solve_task to assemble attempt_1/attempt_2 per input and adjust no-training behavior to identity predictions. Adds diagnostic prints. AGENTS.md marks steps 1.1 and 1.2 as completed with results and notes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Solver as ARCSolver
participant Enh as EnhancedSearch
participant Val as Validator
participant Base as BaselineSearch
User->>Solver: solve_task(train_pairs, test_inputs)
loop For each test_input
Solver->>Enh: synthesize(train_pairs)
Enh-->>Solver: enhanced_solution?
alt Enhanced available
Solver->>Solver: predict(enhanced_solution, test_input)
Solver->>Val: _validate_solution(predictions)
alt Valid
Solver-->>Solver: use enhanced predictions
else Invalid
note over Solver,Base: Fallback path
Solver->>Base: synth_baseline(train_pairs)
Base-->>Solver: baseline_model
Solver->>Solver: predict_two_baseline(baseline_model, test_input)
Solver-->>Solver: use baseline predictions
end
else No enhanced
note over Solver,Base: Direct baseline
Solver->>Base: synth_baseline(train_pairs)
Base-->>Solver: baseline_model
Solver->>Solver: predict_two_baseline(baseline_model, test_input)
Solver-->>Solver: use baseline predictions
end
end
Solver-->>User: attempt_1[], attempt_2[] (per input)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Poem
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
arc_solver/solver.py (1)
138-156: _second_pass_diversified: unguarded to_array conversions can raise on malformed tasksUnlike solve_task, this path doesn’t guard bad grids. A malformed task will raise before the try/except, skipping both enhanced and baseline alt attempts.
Proposed safe version:
def _second_pass_diversified(self, task: Dict[str, List[Dict[str, List[List[int]]]]]) -> Optional[List[List[List[int]]]]: try: train_pairs: List[Tuple[Array, Array]] = [] for p in task.get("train", []): try: train_pairs.append((to_array(p["input"]), to_array(p["output"]))) except Exception: continue test_inputs: List[Array] = [] for p in task.get("test", []): try: test_inputs.append(to_array(p["input"])) except Exception: test_inputs.append(np.zeros((1, 1), dtype=np.int16)) programs = synthesize_with_enhancements(train_pairs, force_alt=True) attempts = predict_two_enhanced(programs, test_inputs, prefer_diverse=True) return [to_list(x) for x in attempts[0]] except Exception: try: programs = synth_baseline(train_pairs) attempts = predict_two_baseline(programs, test_inputs, prefer_diverse=True) return [to_list(x) for x in attempts[0]] except Exception: return None
🧹 Nitpick comments (2)
arc_solver/solver.py (2)
84-107: Replace print + bare Exception with logging and narrower exceptionsUse the logging module and avoid blind Exception catches (Ruff BLE001). This keeps outputs clean and preserves tracebacks in logs.
- if self.use_enhancements: - print("Using enhanced search for prediction") + if self.use_enhancements: + logger.info("Using enhanced search for prediction") progs = synthesize_with_enhancements(train_pairs) attempts = predict_two_enhanced(progs, [test_input]) if self._validate_solution(attempts, [test_input]): return attempts else: - print("Enhanced prediction failed validation") + logger.debug("Enhanced prediction failed validation") else: - print("Enhancements disabled, using baseline search") - except Exception as e: - print(f"Enhanced prediction error: {e}") + logger.info("Enhancements disabled; using baseline search") + except (ValueError, RuntimeError) as e: + logger.exception("Enhanced prediction error: %s", e) - # Fall back to baseline search - self.stats['fallback_used'] += 1 - print("Falling back to baseline search") + # Fall back to baseline search + self.stats['fallback_used'] += 1 + logger.info("Falling back to baseline search") progs = synth_baseline(train_pairs) return predict_two_baseline(progs, [test_input])Add once near the imports:
import logging logger = logging.getLogger(__name__)
25-31: Nit: Optional type hint for guidance_model_pathUse Optional[str] to reflect None default.
def __init__(self, use_enhancements: bool = True, guidance_model_path: Optional[str] = None, episode_db_path: str = "episodes.json"):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
AGENTS.md(2 hunks)arc_solver/solver.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
arc_solver/solver.py
📄 CodeRabbit inference engine (AGENTS.md)
arc_solver/solver.py: Ensure solve_task returns proper non-empty test results: collect first prediction per test input and return {'test': test_predictions}
Fix prediction collection pipeline: make _get_predictions return predictions; add debug logging to trace drops and verify test input processing
Integrate HypothesisEngine into solver as primary reasoning layer and connect to retrieval/neural guidance
Files:
arc_solver/solver.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
PR: tylerbessire/PUMA#0
File: AGENTS.md:0-0
Timestamp: 2025-09-11T04:57:18.055Z
Learning: Applies to arc_solver/solver.py : Fix prediction collection pipeline: make _get_predictions return predictions; add debug logging to trace drops and verify test input processing
Learnt from: CR
PR: tylerbessire/PUMA#0
File: AGENTS.md:0-0
Timestamp: 2025-09-11T04:57:18.055Z
Learning: Applies to arc_solver/solver.py : Ensure solve_task returns proper non-empty test results: collect first prediction per test input and return {'test': test_predictions}
Learnt from: CR
PR: tylerbessire/PUMA#0
File: AGENTS.md:0-0
Timestamp: 2025-09-11T04:57:18.055Z
Learning: Applies to arc_solver/solver.py : Integrate HypothesisEngine into solver as primary reasoning layer and connect to retrieval/neural guidance
📚 Learning: 2025-09-11T04:57:18.055Z
Learnt from: CR
PR: tylerbessire/PUMA#0
File: AGENTS.md:0-0
Timestamp: 2025-09-11T04:57:18.055Z
Learning: Applies to arc_solver/solver.py : Fix prediction collection pipeline: make _get_predictions return predictions; add debug logging to trace drops and verify test input processing
Applied to files:
AGENTS.mdarc_solver/solver.py
📚 Learning: 2025-09-11T04:57:18.055Z
Learnt from: CR
PR: tylerbessire/PUMA#0
File: AGENTS.md:0-0
Timestamp: 2025-09-11T04:57:18.055Z
Learning: Applies to arc_solver/solver.py : Ensure solve_task returns proper non-empty test results: collect first prediction per test input and return {'test': test_predictions}
Applied to files:
arc_solver/solver.py
🧬 Code graph analysis (1)
arc_solver/solver.py (3)
arc_solver/dsl.py (1)
identity(131-133)arc_solver/grid.py (1)
to_list(42-44)arc_solver/enhanced_search.py (2)
synthesize_with_enhancements(267-282)predict_two_enhanced(239-263)
🪛 Ruff (0.12.2)
arc_solver/solver.py
99-99: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (6)
AGENTS.md (2)
76-79: Step 1.1 progress marker: LGTMAccurate date and concise notes.
121-125: Confirm schema keys: repository tests expect "attempt_1" and "attempt_2"tests/test_solver_end2end.py and tests/test_submission_schema.py assert results contain only "attempt_1" and "attempt_2" (and validate their lengths against task["test"]); update AGENTS.md or solver docs/return shape if they still reference a single "test" key.
arc_solver/solver.py (4)
42-42: No-op changeNothing to review for this line.
61-64: Graceful no-train fallback: LGTMIdentity per test input is a safe, deterministic fallback.
65-83: Per-input prediction assembly: indexing and fallbacks look correctCorrectly extracts the single-output per attempt (index [0]) and falls back to identity.
82-83: Return structure: consider adding 'test' alias for backward compatibilityIf any callers still expect {'test': [...]}, add a 'test' alias pointing to attempt_1 while keeping attempt_1/attempt_2. Verify against tests before merging.
Apply if needed:
- return {"attempt_1": attempt1, "attempt_2": attempt2} + # Include 'test' alias for backward compatibility if required by older callers. + return {"attempt_1": attempt1, "attempt_2": attempt2, "test": attempt1}
Summary
Testing
from arc_solver.solver import solve_task
task = {
"train": [
{"input": [[1,0,0],[1,1,0],[0,0,0]], "output": [[0,1,1],[0,1,0],[0,0,0]]}
],
"test": [
{"input": [[0,1,0],[1,1,0],[0,0,0]]}
]
}
res = solve_task(task)
print(res)
PY`
pytest tests/test_solver_end2end.py::TestSolverEndToEnd::test_rotation_task_solving -qpytest tests/test_submission_schema.py::TestSubmissionSchema::test_output_structure -qpytest tests/test_solver_end2end.py::TestSolverEndToEnd::test_multiple_test_inputs -qhttps://chatgpt.com/codex/tasks/task_e_68c2569ef3a883229cb8767af2449391
Summary by CodeRabbit
Refactor
Documentation