Fixed various IQ/IR issues by dominic-mulligan-aws · Pull Request #208 · awslabs/AutoCorrode

dominic-mulligan-aws · 2026-04-17T19:14:40Z

Fixes six bugs reported during use of the I/Q and I/R MCP servers:

get_diagnostics now reports processing status — responses include is_fully_processed, unprocessed, running, and finished fields, so an empty diagnostics list can be distinguished from
"file still being processed". Also fixes capability-backend InvalidParams errors to surface as proper JSON-RPC protocol errors instead of being silently wrapped as tool results.
explore output is size-capped — new optional max_chars_per_result (default 5000) and max_total_chars (default 100000) MCP parameters prevent find_theorems from returning 400KB+
responses when called at deep-record contexts. Set either to 0 to disable.
str_replace no longer collapses newlines across match boundaries — whitespace normalization could collapse \n runs into a single space, causing a pattern with one newline to match a
span containing two. Adds a post-match newline-count invariant so mismatches return "not found" instead of silently eating blank lines.
get_processing_status no longer blocks on the Swing EDT — the handler was wrapped in GUI_Thread.now for a pure counter read, causing up to 60s stalls when PIDE was rendering large
theories. Uses a new getDocumentModel helper that avoids touching jEdit buffers, keeping the entire operation off-EDT.
I/R MCP bridge has a recv timeout — send() now sets a configurable socket timeout (IR_REPL_RECV_TIMEOUT env var, default 600s) with clear error messages distinguishing "auth handshake
timeout" from "ML server unresponsive".
I/R exposes ML-process epoch for restart detection — generates a millisecond-resolution timestamp at ML startup, appended to show/repls output. The Python bridge compares against the
last-seen epoch and prepends a [WARNING] I/R ML process restarted message on mismatch. Epoch is seeded on connect so restarts are detected on the next show/repls call.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…ol errors as JSON-RPC errors Bug 4: `get_diagnostics` now returns `is_fully_processed`, `unprocessed`, `running`, and `finished` alongside the diagnostic list. An empty list can now be distinguished from "file still being processed". Also fixes a latent issue in `handleToolCall` where capability-backend `InvalidParams` errors were wrapped as tool-call results with `isError: true` instead of being surfaced as JSON-RPC protocol errors. The backend already carries a JSON-RPC error code; propagate it.

Bug 8: `explore` / `find_theorems` can return hundreds of KB when called at positions deep inside multi-record contexts (e.g. hundreds of `locality_lemma` variants per field), which drowns MCP clients. Add two optional MCP parameters to `explore`: - `max_bytes_per_result` (default 5000, 0 = unlimited) - `max_total_bytes` (default 100000, 0 = unlimited) Thread both through the regular and batch find_theorems paths. Each result is truncated with a trailing `… (result truncated: N more bytes)` marker; the final joined body gets the total cap with its own marker. Both caps are applied post-join across all patterns in the batch path, so the total cap is a ceiling on the final response, not per-pattern.

Bug 5: whitespace normalization in findUniqueMatch collapses runs of \n \t \r into a single space, so a pattern with one newline can match an original-text span containing two (a blank-line boundary). The replacement then eats the extra newline, collapsing the comment and code onto the same line. Add a post-match invariant: when the search pattern contains \n, the original-text slice must have the same newline count. On mismatch, return SubstringNotFound instead of silently widening the match. Single-line patterns are unaffected.

Bug 7: handleGetProcessingStatus wrapped a pure counter read in GUI_Thread.now, which serializes behind the Swing EDT. When PIDE is rendering a large theory the EDT stays busy for seconds, causing get_processing_status to block for up to 60s. Document_Model.snapshot and Document_Status.Node_Status.make read immutable PIDE state and are safe off-EDT. Remove the GUI_Thread.now wrapper so status polls return immediately regardless of UI load.

Bug 2 residual: after 28464ba moved I/O out of the Future pool, the primary cause of MCP disconnects (worker-pool starvation) is fixed. However the recv loop in send() had no timeout, so a stalled ML process (GC, long reprocessing, eval hang) was indistinguishable from a dead socket — the client would block indefinitely then get a bare EOFError. Set a configurable recv timeout (IR_REPL_RECV_TIMEOUT env var, default 600s) on the TCP socket. Auth handshake and command reads now raise a clear RuntimeError identifying the timeout, so callers can distinguish "bridge dropped" from "ML hung".

Bug 3: after an ML process crash and restart, repl_show returns "No REPL" which is indistinguishable from "never existed". The MCP client has no way to know the ML process changed underneath. Generate a per-process epoch timestamp (with millisecond resolution) at ML startup. Add print_epoch for explicit epoch queries. Append a SERVER_EPOCH=<ts> line to show and repls output. In the Python MCP bridge, extract the epoch and compare against the last-seen value. On mismatch, prepend a clear warning that the ML process restarted and all previous REPL state is gone. Seed the epoch on connect so that subsequent show/repls calls can detect restarts.

Extended_Query_Operation.activate() registers a Session.Consumer that fires on every PIDE command change, and apply_query_at_command() inserts a PIDE overlay that tells the ML prover to evaluate the query print function on every document version. If deactivate() is never called, both persist indefinitely — the consumer creates snapshots and materializes XML on every command change, while the overlay forces the ML process to re-evaluate the query on every edit. Over a session with many explore/sledgehammer queries, leaked operations accumulate and cause escalating Poly/ML GC pressure, leading to minutes-long GC pauses that crash the I/Q MCP server via TCP timeout. Fix: wrap the operation lifecycle in try/finally so deactivate() runs on all paths (timeout, exception, normal completion). Track the operation via a var set immediately after activate() so the finally block can clean up even if apply_query_at_command throws. Cancel timed-out queries via cancel_query() before deactivating, so the ML process stops working on abandoned queries. Also make deactivate() idempotent (safe to double-call) and protect waitForTheoryCompletion's node_required state with try/finally so PIDE version retention is always restored even on exception.

The scalac wrapper's macOS Java discovery falls back to /System/Library/Frameworks/JavaVM.framework and /usr/bin/java, neither of which work on modern macOS. Export JAVA_HOME pointing at Isabelle's contrib JDK so the wrapper finds a working runtime.

Dominic Mulligan added 6 commits April 17, 2026 19:58

dominic-mulligan-aws self-assigned this Apr 17, 2026

dominic-mulligan-aws added the bug Something isn't working label Apr 17, 2026

Dominic Mulligan added 2 commits April 18, 2026 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed various IQ/IR issues#208

Fixed various IQ/IR issues#208
dominic-mulligan-aws wants to merge 8 commits into
awslabs:mainfrom
dominic-mulligan-aws:iq-ir-fixes

dominic-mulligan-aws commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dominic-mulligan-aws commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant