Detailed plan for migrating rlm-claude-code from pure Python to rlm-core Python bindings
Historical planning artifact:
- Checklist lines marked
[historical target]are archival snapshots, not active backlog.- Authoritative live status is tracked in Beads (
bd status) and execution-plan trackers:
docs/execution-plan/STATUS.mddocs/execution-plan/TASK-REGISTRY.mddocs/execution-plan/WORKBOARD.md
Project: rlm-claude-code (Claude Code plugin for RLM) Target: Replace Python implementation with rlm-core Python bindings Scope: ~60 Python modules, ~1.2M bytes of source code Strategy: Phased migration with feature flags for gradual rollout
Actual execution of this migration revealed important constraints:
| Type | Available | Notes |
|---|---|---|
PatternClassifier |
✅ | should_activate(), classify() |
MemoryStore |
✅ | open(), add_node(), query() |
TrajectoryEvent |
✅ | Factory methods, to_json(), with_metadata() |
TrajectoryEventType |
✅ | All event type variants |
ClaimExtractor |
✅ | extract() - pattern-based, not LLM |
SmartRouter |
✅ | route() with RoutingContext |
CostTracker |
✅ | record(), merge(), cost tracking |
quick_hallucination_check |
✅ | Returns risk score (0.0-1.0) |
ReplPool / ReplHandle |
❌ | Not exposed - Python execution must stay in Python |
ClaudeCodeAdapter |
❌ | Not exposed - orchestration stays in Python |
Orchestrator |
❌ | Not exposed - use component delegation instead |
Instead of full replacement, the migration uses component-level delegation:
- Python orchestration layer remains
- Individual components delegate to rlm_core when available
- Feature flag
RLM_USE_CORE=true/falsetoggles delegation - Legacy code preserved for backward compatibility
| Phase | Component | Status | Commit |
|---|---|---|---|
| 1 | Infrastructure + USE_RLM_CORE flag |
✅ | 913628b |
| 2 | complexity_classifier.py → PatternClassifier |
✅ | 788575c |
| 3 | memory_store.py → MemoryStore |
✅ | c2543ee |
| 4 | trajectory.py → TrajectoryEvent |
✅ | e695b39 |
| 5 | repl_environment.py → ReplPool |
⏭️ N/A | - |
| 6 | epistemic/claim_extractor.py → ClaimExtractor |
✅ | 1b1ccc7 |
| 7 | smart_router.py → SmartRouter |
✅ | 58e2bcf |
| 8 | Cleanup legacy code | ⏭️ Deferred | - |
Authoritative live status for this repository is the closed Beads task loop-cyl:
- Component delegation is complete and supported.
- Full Python replacement remains intentionally out of scope until the Python bindings expose orchestration/repl surfaces (
Orchestrator,ClaudeCodeAdapter,ReplPool/ReplHandle).
This table is archival target-state planning from early migration design. For live truth, use the "Migration Reality" section above.
| rlm-claude-code Module | rlm-core Replacement | Notes |
|---|---|---|
orchestrator.py |
rlm_core.Orchestrator |
Target-state only; binding is not currently exposed |
intelligent_orchestrator.py |
rlm_core.ClaudeCodeAdapter |
Target-state only; binding is not currently exposed |
local_orchestrator.py |
rlm_core.Orchestrator |
Target-state only; binding is not currently exposed |
complexity_classifier.py |
rlm_core.PatternClassifier |
Uses ActivationDecision |
auto_activation.py |
rlm_core.PatternClassifier.should_activate() |
Built into classifier |
repl_environment.py |
rlm_core.ReplHandle, ReplPool |
Target-state only; bindings not currently exposed |
memory_store.py |
rlm_core.SqliteMemoryStore |
Hypergraph memory |
memory_backend.py |
rlm_core.SqliteMemoryStore |
Unified backend |
memory_evolution.py |
rlm_core.SqliteMemoryStore (tier operations) |
Consolidate/promote/decay |
trajectory.py |
rlm_core.TrajectoryEvent |
Event streaming |
trajectory_analysis.py |
rlm_core.TrajectoryEvent |
Analysis via events |
cost_tracker.py |
rlm_core.CostTracker (via LLM module) |
Per-component tracking |
smart_router.py |
rlm_core.SmartRouter |
Query-aware model selection |
reasoning_traces.py |
rlm_core.ReasoningTraceStore |
Deciduous-style traces |
epistemic/ |
rlm_core.epistemic module |
Full replacement |
These modules integrate with rlm-core but retain some logic:
| Module | rlm-core Integration | Retained Logic |
|---|---|---|
api_client.py |
rlm_core.AnthropicClient, LLMClient |
API-specific wrappers |
context_manager.py |
rlm_core.SessionContext |
Plugin-specific context handling |
config.py |
rlm_core.AdapterConfig |
Environment variable handling |
types.py |
rlm_core.context, trajectory types |
Plugin-specific types |
rich_output.py |
rlm_core.TrajectoryEvent |
Terminal formatting retained |
These modules become obsolete:
| Module | Reason |
|---|---|
learning.py |
Replaced by rlm-core memory evolution |
continuous_learning.py |
Replaced by rlm-core memory evolution |
strategy_cache.py |
Integrated into rlm-core memory |
state_persistence.py |
Handled by rlm-core SQLite |
embedding_retrieval.py |
Integrated into memory store |
context_index.py |
Integrated into memory store |
prompt_optimizer.py |
Handled by rlm-core routing |
learned_routing.py |
Replaced by SmartRouter |
setfit_classifier.py |
Replaced by PatternClassifier |
gliner_extractor.py |
Replaced by rlm-core extractors |
These modules remain in rlm-claude-code:
| Module | Reason |
|---|---|
__init__.py |
Plugin entry point |
repl_plugin.py |
Claude Code plugin interface |
tool_bridge.py |
MCP tool exposure |
response_parser.py |
Claude-specific response handling |
prompts.py |
Plugin-specific prompt templates |
visualization.py |
Claude Code output formatting |
progress.py |
UI progress indicators |
This phased checklist is historical planning context.
Unchecked "Exit Criteria" items below are not the live backlog; they are preserved for traceability.
Live scope/status is tracked in Beads (loop-cyl) and docs/execution-plan/STATUS.md.
Duration: 1-2 days Risk: Low
-
Add rlm-core Python bindings to
pyproject.toml:[project.optional-dependencies] rlm-core = ["rlm-core>=0.1.0"]
-
Create feature flag in config:
# config.py USE_RLM_CORE = os.getenv("RLM_USE_CORE", "false").lower() == "true"
-
Create adapter layer:
# adapters/core_adapter.py if USE_RLM_CORE: import rlm_core # Use rlm-core implementations else: # Use legacy implementations
Exit Criteria:
- [historical target] rlm-core imports successfully
- [historical target] Feature flag toggles between implementations
- [historical target] Existing tests pass with flag off
Duration: 2-3 days Risk: Low
-
Replace
complexity_classifier.py:# Before from .complexity_classifier import ComplexityClassifier classifier = ComplexityClassifier() decision = classifier.classify(query, context) # After from rlm_core import PatternClassifier, SessionContext classifier = PatternClassifier() ctx = SessionContext(messages=context.messages) decision = classifier.should_activate(query, ctx)
-
Update
auto_activation.pyto useActivationDecision -
Update tests:
tests/unit/test_auto_activation.pytests/unit/test_complexity_classifier.py(can be removed)
Exit Criteria:
- [historical target] PatternClassifier produces equivalent results
- [historical target] Auto-activation tests pass
- [historical target] Performance within 10% of original
Duration: 1 week Risk: Medium (data migration)
-
Replace
memory_store.pywithSqliteMemoryStore:# Before from .memory_store import MemoryStore store = MemoryStore(db_path) # After from rlm_core import SqliteMemoryStore store = SqliteMemoryStore(db_path)
-
Migrate data schema:
- Create migration script for existing SQLite databases
- Map old node types to rlm-core
NodeTypeenum - Map old tiers to rlm-core
Tierenum
-
Update dependent modules:
memory_backend.py→ Remove (use SqliteMemoryStore directly)memory_evolution.py→ Remove (use store.consolidate/promote/decay)cross_session_promotion.py→ Update to use rlm-core promotion
-
Update tests:
tests/unit/test_memory_backend.pytests/unit/test_memory_store.pytests/integration/test_memory_evolution.py
Migration Script:
# scripts/migrate_memory.py
def migrate_database(old_path: str, new_path: str):
"""Migrate rlm-claude-code memory to rlm-core format."""
# 1. Read old schema
# 2. Transform node types
# 3. Transform tier values
# 4. Write to new schemaExit Criteria:
- [historical target] Existing memories migrate without data loss
- [historical target] Tier evolution works correctly
- [historical target] Semantic search produces equivalent results
- [historical target] Memory tests pass
Duration: 2-3 days Risk: Low
-
Replace
trajectory.py:# Before from .trajectory import TrajectoryEmitter, TrajectoryEvent emitter = TrajectoryEmitter() emitter.emit(TrajectoryEvent.RLM_START, {"query": query}) # After from rlm_core import TrajectoryEvent, TrajectoryEventType event = TrajectoryEvent( event_type=TrajectoryEventType.RlmStart, content=query, depth=0 )
-
Update
trajectory_analysis.pyto consume rlm-core events -
Update
rich_output.pyto render rlm-core events
Exit Criteria:
- [historical target] All event types map correctly
- [historical target] Trajectory streaming works
- [historical target] Export/replay functions work
Status: ⏭️ NOT APPLICABLE
Reason: ReplPool and ReplHandle are not exposed in rlm-core Python bindings.
Technical Justification:
The REPL environment executes arbitrary Python code in a sandboxed environment using RestrictedPython:
# repl_environment.py uses Python-specific sandboxing
from RestrictedPython import compile_restricted, safe_builtins
from RestrictedPython.Guards import guarded_iter_unpack_sequence, safer_getattrThis is fundamentally Python-specific:
- Code compilation:
compile_restricted()compiles Python AST with security checks - Execution namespace: Python globals/locals with RLM helper functions injected
- Output capture: Python stdout/stderr interception
Rust cannot efficiently execute arbitrary Python code. Even if ReplPool were exposed, it would just be a thin wrapper calling back into Python, adding overhead without benefit.
Resolution: Keep repl_environment.py as pure Python. No migration needed or possible.
Duration: 3-4 days Risk: Low (new feature, parallel implementation)
-
Replace
epistemic/module:# Before from .epistemic import ClaimExtractor, HallucinationDetector # After from rlm_core import ( ClaimExtractor, EpistemicVerifier, MemoryGate, verify_claim, quick_hallucination_check )
-
Wire up memory gate:
gate = MemoryGate(MemoryGateConfig(threshold=2.0)) decision = gate.check(claim, evidence) if decision == GateDecision.Reject: # Don't store in memory
Exit Criteria:
- [historical target] Hallucination detection rate maintained
- [historical target] Memory gate rejects ungrounded facts
- [historical target] Epistemic tests pass
Status:
Reason: ClaudeCodeAdapter and Orchestrator are not exposed in rlm-core Python bindings.
What Was Done Instead:
Added SmartRouter delegation for routing decisions:
# smart_router.py now has optional rlm_core delegation
class SmartRouter:
@property
def uses_rlm_core(self) -> bool:
return self._core_router is not None
def route_core(self, query: str, depth: int = 0, ...) -> dict | None:
"""Fast routing via rlm_core.SmartRouter."""
if self._core_router is None:
return None
ctx = _rlm_core.RoutingContext().with_depth(depth)
decision = self._core_router.route(query, ctx)
return {"model": decision.model.id, "tier": str(decision.tier), ...}Technical Justification: The orchestration layer coordinates async operations across:
- Python asyncio event loop
- LLM API calls (aiohttp/httpx)
- REPL execution (subprocess)
- Memory operations (SQLite)
Cross-language async orchestration (Python asyncio ↔ Rust tokio) is complex and error-prone. The rlm-core bindings expose component-level APIs instead:
| Component | Delegation |
|---|---|
| Model routing | ✅ SmartRouter.route_core() |
| Cost tracking | ✅ CostTracker.record() |
| Trajectory events | ✅ TrajectoryEvent factory methods |
| Memory operations | ✅ MemoryStore |
| Classification | ✅ PatternClassifier.should_activate() |
Resolution: Keep Python orchestrator. Components delegate to rlm_core individually.
Future: If rlm-core exposes ClaudeCodeAdapter Python bindings with proper async support, full orchestrator migration becomes possible.
Status: ⏭️ DEFERRED
Reason: Legacy code must remain for backward compatibility.
Why Cleanup Is Premature:
-
Feature flag pattern requires both paths:
if USE_RLM_CORE and _rlm_core is not None: # Use rlm_core else: # Use Python implementation (legacy)
-
Users without rlm_core installed need the Python fallback
-
Testing requires both modes:
RLM_USE_CORE=falsetests pure Python behaviorRLM_USE_CORE=truetests rlm_core delegation- Comparing both validates equivalence
-
CI/CD pipelines may not have rlm_core in all environments
When Cleanup Becomes Appropriate:
| Condition | Action |
|---|---|
| rlm-core becomes required dependency | Remove feature flags |
| All consumers migrated to rlm-core | Remove legacy code |
| Python bindings expose all needed types | Full replacement possible |
| 6+ months stable with rlm_core | Safe to remove fallbacks |
Current State:
- Legacy code preserved in rlm-claude-code-rlmcore fork
- Feature flag
USE_RLM_COREdefaults tofalse - Both paths tested and working
| Category | Approach |
|---|---|
| Unit tests | Run with both implementations during migration |
| Integration tests | Test rlm-core integration points |
| Regression tests | Compare outputs between old and new |
| Performance tests | Benchmark critical paths |
Create output comparison tests:
def test_regression(query, context):
"""Compare old vs new implementation."""
old_result = old_orchestrator.run(query, context)
new_result = new_adapter.execute(query, context)
assert_equivalent(old_result, new_result)| Metric | Target |
|---|---|
| REPL execution | < 100ms (simple operations) |
| Memory query | < 200ms (semantic search) |
| Trajectory event | < 10ms |
| Cold start | < 2s |
Each phase includes rollback capability:
- Feature flag: Set
RLM_USE_CORE=falseto revert - Version pinning: Keep old code until phase complete
- Database backup: Backup memory before migration
- Git tags: Tag before each phase for easy revert
| Risk | Impact | Probability | Mitigation |
|---|---|---|---|
| Memory data loss | High | Low | Backup + migration script |
| Performance regression | Medium | Medium | Benchmarks per phase |
| API incompatibility | Medium | Low | Adapter pattern |
| Subprocess issues | Medium | Medium | Thorough REPL testing |
| Integration failures | High | Medium | Gradual rollout with flag |
| Phase | Original | Actual | Notes |
|---|---|---|---|
| Phase 1: Add dependency | 1-2 days | ✅ 1 day | As expected |
| Phase 2: Complexity | 2-3 days | ✅ 1 day | Simpler than expected |
| Phase 3: Memory | 5-7 days | ✅ 1 day | Schema handled by rlm_core |
| Phase 4: Trajectory | 2-3 days | ✅ 1 day | Factory methods simplified |
| Phase 5: REPL | 3-4 days | ⏭️ N/A | Not possible - Python-specific |
| Phase 6: Epistemic | 3-4 days | ✅ 1 day | Pattern-based only |
| Phase 7: Orchestrator | 5-7 days | Component delegation only | |
| Phase 8: Cleanup | 2-3 days | ⏭️ Deferred | Backward compat needed |
| Phase | Duration | Status |
|---|---|---|
| Phase 1-4, 6-7 | ~1 week | ✅ Complete |
| Phase 5 | N/A | ⏭️ Skipped |
| Phase 8 | TBD | ⏭️ Deferred |
| Validation & Testing | 1-2 weeks | 🔄 Pending |
| PR & Merge | 1 week | 🔄 Pending |
Total for component delegation: ~1 week (actual) Total including validation: ~3-4 weeks
# Install rlm-core in development
pip install -e /path/to/rlm-core[python]
# Run with rlm-core enabled
RLM_USE_CORE=true python -m rlm_claude_code
# Run tests with both implementations
pytest tests/ --rlm-core-enabled
pytest tests/ --rlm-core-disabled
# Run memory migration
python scripts/migrate_memory.py --old-db ~/.rlm/memory.db --new-db ~/.rlm/memory_v2.db
# Benchmark
pytest tests/benchmarks/ --benchmark-compareMigration is complete when:
- All rlm-claude-code tests pass with
RLM_USE_CORE=false - All tests pass with
RLM_USE_CORE=true(rlm_core available) - [historical target] Performance within 10% of original
- [historical target] Memory operations work with both backends
- Feature flag controls delegation
- Graceful fallback when rlm_core unavailable
- Documentation updated with migration reality
- [historical target] Fork merged via PR after validation
Full migration requires rlm-core to expose:
- [historical target]
ClaudeCodeAdapterwith async Python support - [historical target]
ReplPool/ReplHandle(or accept Python-specific impl) - [historical target] Full
Orchestratorinterface
Until then, component delegation provides:
- Unified trajectory format across consumers
- Shared memory schema
- Consistent routing decisions
- Common epistemic verification primitives