Skip to content

Fixed various IQ/IR issues#208

Open
dominic-mulligan-aws wants to merge 8 commits into
awslabs:mainfrom
dominic-mulligan-aws:iq-ir-fixes
Open

Fixed various IQ/IR issues#208
dominic-mulligan-aws wants to merge 8 commits into
awslabs:mainfrom
dominic-mulligan-aws:iq-ir-fixes

Conversation

@dominic-mulligan-aws
Copy link
Copy Markdown
Contributor

Fixes six bugs reported during use of the I/Q and I/R MCP servers:

  • get_diagnostics now reports processing status — responses include is_fully_processed, unprocessed, running, and finished fields, so an empty diagnostics list can be distinguished from
    "file still being processed". Also fixes capability-backend InvalidParams errors to surface as proper JSON-RPC protocol errors instead of being silently wrapped as tool results.
  • explore output is size-capped — new optional max_chars_per_result (default 5000) and max_total_chars (default 100000) MCP parameters prevent find_theorems from returning 400KB+
    responses when called at deep-record contexts. Set either to 0 to disable.
  • str_replace no longer collapses newlines across match boundaries — whitespace normalization could collapse \n runs into a single space, causing a pattern with one newline to match a
    span containing two. Adds a post-match newline-count invariant so mismatches return "not found" instead of silently eating blank lines.
  • get_processing_status no longer blocks on the Swing EDT — the handler was wrapped in GUI_Thread.now for a pure counter read, causing up to 60s stalls when PIDE was rendering large
    theories. Uses a new getDocumentModel helper that avoids touching jEdit buffers, keeping the entire operation off-EDT.
  • I/R MCP bridge has a recv timeout — send() now sets a configurable socket timeout (IR_REPL_RECV_TIMEOUT env var, default 600s) with clear error messages distinguishing "auth handshake
    timeout" from "ML server unresponsive".
  • I/R exposes ML-process epoch for restart detection — generates a millisecond-resolution timestamp at ML startup, appended to show/repls output. The Python bridge compares against the
    last-seen epoch and prepends a [WARNING] I/R ML process restarted message on mismatch. Epoch is seeded on connect so restarts are detected on the next show/repls call.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Dominic Mulligan added 6 commits April 17, 2026 19:58
…ol errors as JSON-RPC errors

Bug 4: `get_diagnostics` now returns `is_fully_processed`, `unprocessed`,
`running`, and `finished` alongside the diagnostic list. An empty list
can now be distinguished from "file still being processed".

Also fixes a latent issue in `handleToolCall` where capability-backend
`InvalidParams` errors were wrapped as tool-call results with
`isError: true` instead of being surfaced as JSON-RPC protocol errors.
The backend already carries a JSON-RPC error code; propagate it.
Bug 8: `explore` / `find_theorems` can return hundreds of KB when called
at positions deep inside multi-record contexts (e.g. hundreds of
`locality_lemma` variants per field), which drowns MCP clients.

Add two optional MCP parameters to `explore`:
  - `max_bytes_per_result` (default 5000, 0 = unlimited)
  - `max_total_bytes` (default 100000, 0 = unlimited)

Thread both through the regular and batch find_theorems paths. Each
result is truncated with a trailing `… (result truncated: N more bytes)`
marker; the final joined body gets the total cap with its own marker.
Both caps are applied post-join across all patterns in the batch path,
so the total cap is a ceiling on the final response, not per-pattern.
Bug 5: whitespace normalization in findUniqueMatch collapses runs of
\n \t \r into a single space, so a pattern with one newline can match
an original-text span containing two (a blank-line boundary). The
replacement then eats the extra newline, collapsing the comment and
code onto the same line.

Add a post-match invariant: when the search pattern contains \n, the
original-text slice must have the same newline count. On mismatch,
return SubstringNotFound instead of silently widening the match.
Single-line patterns are unaffected.
Bug 7: handleGetProcessingStatus wrapped a pure counter read in
GUI_Thread.now, which serializes behind the Swing EDT. When PIDE is
rendering a large theory the EDT stays busy for seconds, causing
get_processing_status to block for up to 60s.

Document_Model.snapshot and Document_Status.Node_Status.make read
immutable PIDE state and are safe off-EDT. Remove the GUI_Thread.now
wrapper so status polls return immediately regardless of UI load.
Bug 2 residual: after 28464ba moved I/O out of the Future pool, the
primary cause of MCP disconnects (worker-pool starvation) is fixed.
However the recv loop in send() had no timeout, so a stalled ML
process (GC, long reprocessing, eval hang) was indistinguishable from
a dead socket — the client would block indefinitely then get a bare
EOFError.

Set a configurable recv timeout (IR_REPL_RECV_TIMEOUT env var,
default 600s) on the TCP socket. Auth handshake and command reads
now raise a clear RuntimeError identifying the timeout, so callers
can distinguish "bridge dropped" from "ML hung".
Bug 3: after an ML process crash and restart, repl_show returns
"No REPL" which is indistinguishable from "never existed". The MCP
client has no way to know the ML process changed underneath.

Generate a per-process epoch timestamp (with millisecond resolution)
at ML startup. Add print_epoch for explicit epoch queries. Append a
SERVER_EPOCH=<ts> line to show and repls output. In the Python MCP
bridge, extract the epoch and compare against the last-seen value. On
mismatch, prepend a clear warning that the ML process restarted and
all previous REPL state is gone. Seed the epoch on connect so that
subsequent show/repls calls can detect restarts.
@dominic-mulligan-aws dominic-mulligan-aws self-assigned this Apr 17, 2026
@dominic-mulligan-aws dominic-mulligan-aws added the bug Something isn't working label Apr 17, 2026
Dominic Mulligan added 2 commits April 18, 2026 20:32
Extended_Query_Operation.activate() registers a Session.Consumer that
fires on every PIDE command change, and apply_query_at_command()
inserts a PIDE overlay that tells the ML prover to evaluate the query
print function on every document version. If deactivate() is never
called, both persist indefinitely — the consumer creates snapshots and
materializes XML on every command change, while the overlay forces the
ML process to re-evaluate the query on every edit. Over a session with
many explore/sledgehammer queries, leaked operations accumulate and
cause escalating Poly/ML GC pressure, leading to minutes-long GC
pauses that crash the I/Q MCP server via TCP timeout.

Fix: wrap the operation lifecycle in try/finally so deactivate() runs
on all paths (timeout, exception, normal completion). Track the
operation via a var set immediately after activate() so the finally
block can clean up even if apply_query_at_command throws. Cancel
timed-out queries via cancel_query() before deactivating, so the ML
process stops working on abandoned queries.

Also make deactivate() idempotent (safe to double-call) and protect
waitForTheoryCompletion's node_required state with try/finally so
PIDE version retention is always restored even on exception.
The scalac wrapper's macOS Java discovery falls back to
/System/Library/Frameworks/JavaVM.framework and /usr/bin/java,
neither of which work on modern macOS.  Export JAVA_HOME pointing
at Isabelle's contrib JDK so the wrapper finds a working runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant