109 lines (80 loc) · 3.56 KB

Common Issues

For response sequencing and closure criteria, start with incident-playbook.md and use this file for pattern-specific fixes.

1. `make check` fails with type/test errors

Symptoms:

Nonzero exit
Compiler or assertion failures

Actions:

Re-run make check.
Run failing test target directly for focused logs.
Fix root cause.
Re-run make check and governance gates.

2. `./scripts/dp ...` returns `ok: false`

Symptoms:

JSON output contains failing step(s)

Actions:

Identify first failed step in JSON.
Run that command directly.
Resolve failure.
Re-run full dp chain.

3. Python binding build failures

Symptoms:

maturin develop errors

Actions:

Verify Python and uv versions.
Re-run uv sync.
Ensure Rust build succeeds first.
Re-run binding install.

4. Go binding test failures

Symptoms:

go test ./... fails in rlm-core/go/rlmcore

Actions:

Confirm static Rust library exists in expected path.
Rebuild Rust library.
Re-run Go tests.

5. Drift detection output appears unstable

Symptoms:

Different drift ordering/results on repeated runs

Actions:

Re-run targeted sync/drift tests.
Verify input ordering assumptions.
Check symbol interning / temporary allocation code paths.
Capture deterministic repro fixture before patching.

6. Spec-agent output includes placeholders unexpectedly

Symptoms:

Generated artifacts include draft: annotations when strict baseline output was expected

Actions:

Verify CompletenessMode configuration.
Use Baseline for placeholder-free stubs.
Re-run formalization and tests.

7. "It works locally" but gates fail elsewhere

Symptoms:

Local ad hoc run passes, policy gates fail

Actions:

Trust gate outputs over ad hoc confidence.
Capture exact environment details.
Align local commands with policy commands.
Re-run from clean state.

8. Ignored Lean/REPL integration tests hang or leave subprocesses behind

Symptoms:

cargo test ... -- --ignored appears stuck
Later runs show odd REPL startup errors
Background rlm_repl or Lean repl processes linger

Actions:

Run ignored subprocess tests serially: cd /Users/rand/src/loop/rlm-core && cargo test --no-default-features --features gemini test_repl_spawn -- --ignored --test-threads=1 && cargo test --no-default-features --features gemini test_lean_repl_spawn -- --ignored --test-threads=1
Treat runtime > 120s as a failure signal and capture logs to evidence.
Scan for leftovers: ps -axo pid=,command=,rss= -ww | rg -n "rlm_repl|lake env repl|\\brepl\\b" -S
If environment prerequisites are missing (Python package path, Lean toolchain), record the actionable stderr and classify as environment blocker, not runtime success.
Re-run after fixing prerequisites; unattended runs should complete without manual process cleanup.

9. Proof automation reports "Missing proof state"

Symptoms:

Tactic execution fails immediately with a deterministic missing proof-state error
AI tactic validation returns failure before trying any tactic

Actions:

Ensure you are targeting a sorry location with a real proof_state_id (protocol session target selection prefers these).
Run the protocol path (execute_tactic_with_feedback / execute_tactic_with_repl) instead of ad hoc tactic calls without state.
Confirm operation IDs are stateful (sorry:<proof_state>:<idx>) rather than sorry:missing:<idx>.
If Lean response omitted proof state, treat as runtime blocker and capture the raw response in evidence.

Because gravity is also optional until it is not.