|
| 1 | +# Enhanced ARC-AGI-2 Solver |
| 2 | + |
| 3 | +This repository contains an advanced solver for the **ARC Prize 2025** competition (ARC‑AGI‑2), implementing the complete blueprint from neuroscience-inspired research. It combines symbolic reasoning with neural guidance, episodic retrieval, program sketches, and test-time training to achieve superior performance on abstract reasoning tasks. |
| 4 | + |
| 5 | +## Key Features |
| 6 | + |
| 7 | +### 🧠 Neuroscience-Inspired Architecture |
| 8 | +- **Neural guidance**: Predicts relevant DSL operations using task features |
| 9 | +- **Episodic retrieval**: Maintains database of solved tasks for analogical reasoning |
| 10 | +- **Program sketches**: Mines common operation sequences as macro-operators |
| 11 | +- **Test-time training**: Adapts scoring functions to each specific task |
| 12 | +- **Multi-demand network analog**: Prioritizes candidate programs using learned heuristics |
| 13 | + |
| 14 | +### 🔧 Enhanced Capabilities |
| 15 | +- **Object-centric parsing** with connected component analysis |
| 16 | +- **Compact DSL** with composable primitives (rotate, flip, translate, recolor, etc.) |
| 17 | +- **Two-attempt diversity** as required by ARC Prize 2025 rules |
| 18 | +- **Fallback resilience** with graceful degradation to baseline methods |
| 19 | +- **Performance monitoring** with detailed statistics and benchmarking |
| 20 | + |
| 21 | +## Directory Structure |
| 22 | + |
| 23 | +``` |
| 24 | +arc_solver_project/ |
| 25 | +│ |
| 26 | +├── arc_solver/ # Core solver package |
| 27 | +│ ├── grid.py # Grid operations and utilities |
| 28 | +│ ├── objects.py # Connected component extraction |
| 29 | +│ ├── dsl.py # Domain-specific language primitives |
| 30 | +│ ├── heuristics.py # Heuristic rule inference |
| 31 | +│ ├── search.py # Basic brute-force search |
| 32 | +│ ├── solver.py # Main solver interface (enhanced) |
| 33 | +│ ├── enhanced_solver.py # Enhanced solver with neural components |
| 34 | +│ ├── enhanced_search.py # Neural-guided program synthesis |
| 35 | +│ ├── io_utils.py # JSON loading and submission helpers |
| 36 | +│ └── neural/ # Neural guidance components |
| 37 | +│ ├── features.py # Task feature extraction |
| 38 | +│ ├── guidance.py # Neural operation prediction |
| 39 | +│ ├── sketches.py # Program sketch mining |
| 40 | +│ ├── episodic.py # Episodic retrieval system |
| 41 | +│ └── ttt.py # Test-time training |
| 42 | +│ |
| 43 | +├── arc_submit.py # Command-line submission script |
| 44 | +├── train_neural_guidance.py # Training script for neural components |
| 45 | +├── benchmark.py # Benchmarking and evaluation tools |
| 46 | +└── README.md # This file |
| 47 | +``` |
| 48 | + |
| 49 | +## Quick Start |
| 50 | + |
| 51 | +### 1. Basic Usage (Kaggle-ready) |
| 52 | + |
| 53 | +```bash |
| 54 | +# Generate submission file (uses enhanced solver by default) |
| 55 | +python arc_submit.py |
| 56 | + |
| 57 | +# Use baseline solver only (if needed) |
| 58 | +ARC_USE_BASELINE=1 python arc_submit.py |
| 59 | +``` |
| 60 | + |
| 61 | +### 2. Training Neural Components |
| 62 | + |
| 63 | +```bash |
| 64 | +# Train neural guidance (requires training data) |
| 65 | +python train_neural_guidance.py |
| 66 | + |
| 67 | +# Or setup environment with defaults |
| 68 | +python benchmark.py |
| 69 | +``` |
| 70 | + |
| 71 | +### 3. Python API |
| 72 | + |
| 73 | +```python |
| 74 | +from arc_solver.enhanced_solver import solve_task_enhanced |
| 75 | + |
| 76 | +# Solve a single task with full enhancements |
| 77 | +result = solve_task_enhanced(task) |
| 78 | + |
| 79 | +# Configure solver behavior |
| 80 | +from arc_solver.enhanced_solver import ARCSolver |
| 81 | +solver = ARCSolver(use_enhancements=True) |
| 82 | +result = solver.solve_task(task) |
| 83 | +``` |
| 84 | + |
| 85 | +## How It Works |
| 86 | + |
| 87 | +### Enhanced Pipeline |
| 88 | + |
| 89 | +1. **Feature Extraction**: Extract task-level features (colors, objects, transformations) |
| 90 | +2. **Neural Guidance**: Predict which DSL operations are likely relevant |
| 91 | +3. **Episodic Retrieval**: Query database for similar previously solved tasks |
| 92 | +4. **Sketch-Based Search**: Use mined program templates with parameter filling |
| 93 | +5. **Test-Time Adaptation**: Fine-tune scoring function using task demonstrations |
| 94 | +6. **Program Selection**: Rank and select top 2 diverse candidate programs |
| 95 | + |
| 96 | +### Fallback Strategy |
| 97 | + |
| 98 | +If enhanced components fail, the solver gracefully falls back to: |
| 99 | +- Heuristic single-step transformations |
| 100 | +- Brute-force enumeration of 2-step programs |
| 101 | +- Identity transformation as last resort |
| 102 | + |
| 103 | +## Configuration |
| 104 | + |
| 105 | +The solver supports extensive configuration through environment variables and config files: |
| 106 | + |
| 107 | +### Environment Variables |
| 108 | +- `ARC_USE_BASELINE=1`: Force baseline solver only |
| 109 | +- `ARC_DISABLE_ENHANCEMENTS=1`: Disable enhanced features |
| 110 | + |
| 111 | +### Configuration File |
| 112 | +```json |
| 113 | +{ |
| 114 | + "use_neural_guidance": true, |
| 115 | + "use_episodic_retrieval": true, |
| 116 | + "use_program_sketches": true, |
| 117 | + "use_test_time_training": true, |
| 118 | + "max_programs": 256, |
| 119 | + "timeout_per_task": 30.0 |
| 120 | +} |
| 121 | +``` |
| 122 | + |
| 123 | +## Neural Components |
| 124 | + |
| 125 | +### Neural Guidance |
| 126 | +- **Purpose**: Predict which DSL operations are relevant for a given task |
| 127 | +- **Architecture**: Simple MLP with task-level features |
| 128 | +- **Training**: Uses extracted features from training demonstrations |
| 129 | +- **Output**: Operation relevance scores to guide search |
| 130 | + |
| 131 | +### Episodic Retrieval |
| 132 | +- **Purpose**: Reuse solutions from similar previously solved tasks |
| 133 | +- **Method**: Task signature matching with feature-based similarity |
| 134 | +- **Storage**: JSON-based database of solved programs with metadata |
| 135 | +- **Retrieval**: Cosine similarity on numerical features + boolean feature matching |
| 136 | + |
| 137 | +### Program Sketches |
| 138 | +- **Purpose**: Capture common operation sequences as reusable templates |
| 139 | +- **Mining**: Extract frequent 1-step and 2-step operation patterns |
| 140 | +- **Usage**: Instantiate sketches with different parameter combinations |
| 141 | +- **Adaptation**: Learn from successful programs during solving |
| 142 | + |
| 143 | +### Test-Time Training |
| 144 | +- **Purpose**: Adapt scoring function to each specific task |
| 145 | +- **Method**: Fine-tune lightweight scorer on task demonstrations |
| 146 | +- **Features**: Program length, operation types, success rate, complexity |
| 147 | +- **Augmentation**: Generate synthetic training examples via transformations |
| 148 | + |
| 149 | +## Performance and Evaluation |
| 150 | + |
| 151 | +### Benchmarking |
| 152 | +```python |
| 153 | +from benchmark import Benchmark, SolverConfig |
| 154 | + |
| 155 | +config = SolverConfig() |
| 156 | +benchmark = Benchmark(config) |
| 157 | +results = benchmark.run_benchmark("test_data.json") |
| 158 | +print(f"Success rate: {results['performance_stats']['success_rate']:.3f}") |
| 159 | +``` |
| 160 | + |
| 161 | +### Monitoring |
| 162 | +The solver tracks detailed statistics: |
| 163 | +- Success rates for enhanced vs baseline methods |
| 164 | +- Component usage (episodic hits, neural guidance, TTT adaptation) |
| 165 | +- Timing breakdown per component |
| 166 | +- Failure mode analysis |
| 167 | + |
| 168 | +## Implementation Notes |
| 169 | + |
| 170 | +### Kaggle Compatibility |
| 171 | +- **Offline execution**: No internet access required |
| 172 | +- **Dependency-light**: Uses only NumPy for core operations |
| 173 | +- **Compute budget**: Optimized for ~$0.42 per task limit |
| 174 | +- **Output format**: Exactly 2 attempts per test input as required |
| 175 | + |
| 176 | +### Code Quality |
| 177 | +- **Type hints**: Full typing support for better maintainability |
| 178 | +- **Documentation**: Comprehensive docstrings and comments |
| 179 | +- **Error handling**: Robust fallback mechanisms |
| 180 | +- **Testing**: Validation and benchmarking utilities |
| 181 | + |
| 182 | +## Extending the Solver |
| 183 | + |
| 184 | +### Adding New DSL Operations |
| 185 | +1. Define operation function in `dsl.py` |
| 186 | +2. Add parameter generation in `sketches.py` |
| 187 | +3. Update feature extraction in `features.py` |
| 188 | +4. Retrain neural guidance if needed |
| 189 | + |
| 190 | +### Improving Neural Components |
| 191 | +1. **Better features**: Add domain-specific feature extractors |
| 192 | +2. **Advanced models**: Replace MLP with transformer/GNN |
| 193 | +3. **Meta-learning**: Implement few-shot adaptation algorithms |
| 194 | +4. **Hybrid methods**: Combine symbolic and neural reasoning |
| 195 | + |
| 196 | +### Advanced Techniques |
| 197 | +- **Probabilistic programming**: Sample programs from learned distributions |
| 198 | +- **Curriculum learning**: Train on tasks of increasing difficulty |
| 199 | +- **Multi-agent reasoning**: Ensemble of specialized solvers |
| 200 | +- **Causal reasoning**: Incorporate causal structure learning |
| 201 | + |
| 202 | +## Research Foundation |
| 203 | + |
| 204 | +This implementation is based on the research blueprint "ARC Prize 2025 & Human Fluid Intelligence" which draws from cognitive neuroscience findings about: |
| 205 | + |
| 206 | +- **Multiple-demand (MD) network**: Neural guidance mimics executive control |
| 207 | +- **Basal ganglia gating**: Operation selection and working memory control |
| 208 | +- **Hippocampal-mPFC loop**: Episodic retrieval and schema integration |
| 209 | +- **Test-time adaptation**: Rapid task-specific learning from few examples |
| 210 | + |
| 211 | +The solver architecture directly maps these biological systems to computational components. |
| 212 | + |
| 213 | +## Competition Strategy |
| 214 | + |
| 215 | +### Short-term (Immediate) |
| 216 | +- ✅ Strong symbolic baseline with neural enhancements |
| 217 | +- ✅ Episodic retrieval for common patterns |
| 218 | +- ✅ Test-time adaptation for task specialization |
| 219 | +- ✅ Kaggle-ready submission format |
| 220 | + |
| 221 | +### Medium-term (During Contest) |
| 222 | +- Train neural guidance on public training data |
| 223 | +- Mine program sketches from successful solutions |
| 224 | +- Analyze semi-private feedback for failure modes |
| 225 | +- Expand DSL based on discovered patterns |
| 226 | + |
| 227 | +### Long-term (Advanced Research) |
| 228 | +- Probabilistic program synthesis |
| 229 | +- Hybrid symbolic-neural architecture |
| 230 | +- Broader cognitive priors and meta-learning |
| 231 | +- Integration with large language models |
| 232 | + |
| 233 | +## License |
| 234 | + |
| 235 | +This code is designed to be open-sourced under an appropriate license as required by ARC Prize 2025 rules. |
| 236 | + |
| 237 | +## Citation |
| 238 | + |
| 239 | +If you use this solver or build upon its ideas, please cite the research blueprint and this implementation. |
| 240 | + |
| 241 | +## Contributing |
| 242 | + |
| 243 | +Contributions are welcome! Focus areas: |
| 244 | +- Neural architecture improvements |
| 245 | +- New DSL operations based on failure analysis |
| 246 | +- Advanced meta-learning techniques |
| 247 | +- Performance optimizations for Kaggle constraints |
| 248 | + |
| 249 | +--- |
| 250 | + |
| 251 | +**Ready to win ARC Prize 2025!** 🏆 |
0 commit comments