Skip to content

Conversation

@roomote
Copy link

@roomote roomote bot commented Oct 2, 2025

Summary

This PR fixes the issue where the first MCP tool call fails with "Connection closed" error immediately after installing a server from the marketplace.

Problem

When installing an MCP server from the marketplace, the server shows "Running" status in the UI, but the first tool call fails with "MCP error -32000: Connection closed". Users have to retry the exact same tool call for it to succeed.

Solution

The fix implements:

  1. Connection readiness check: Before marking a server as "connected", we now verify the connection is ready by attempting to fetch the tools list
  2. Retry mechanism with exponential backoff: Added retry logic for the initial tools list fetch to handle connection initialization delays
  3. Tool call retry: Implemented retry mechanism for tool calls that fail with connection errors

Changes

  • Modified McpHub.connectToServer() to wait for server readiness before marking as connected
  • Added fetchToolsListWithRetry() method with exponential backoff (500ms, 1s, 2s)
  • Enhanced callTool() with retry logic for connection errors (2 retries with 1s delay)

Testing

  • All existing tests pass ✅
  • The fix has been tested locally and resolves the reported issue
  • No regression in existing MCP server functionality

Fixes #8468


Important

Fixes MCP server connection race condition by adding readiness checks and retry mechanisms in McpHub class.

  • Behavior:
    • Fixes race condition in MCP server connection by adding readiness check in McpHub.connectToServer().
    • Implements fetchToolsListWithRetry() with exponential backoff (500ms, 1s, 2s) for connection readiness.
    • Adds retry logic in callTool() for connection errors (2 retries with 1s delay).
  • Functions:
    • Modifies connectToServer() to verify connection readiness before marking as connected.
    • Adds fetchToolsListWithRetry() to handle initial connection delays.
    • Enhances callTool() with retry mechanism for connection errors.
  • Testing:
    • All existing tests pass.
    • Tested locally to resolve the issue without regression.

This description was created by Ellipsis for 794ddfa. You can customize this summary. It will automatically update as commits are pushed.

- Add connection readiness check before marking server as connected
- Implement retry logic with exponential backoff for initial tool list fetch
- Add retry mechanism for tool calls to handle connection initialization race
- Wait for server to be fully ready before showing 'Running' status

This fixes the issue where the first MCP tool call fails with 'Connection closed'
error after marketplace installation, ensuring servers are fully initialized
before accepting tool calls.

Fixes #8468
@roomote roomote bot requested review from cte, jr and mrubens as code owners October 2, 2025 16:58
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Oct 2, 2025
@dosubot dosubot bot added the bug Something isn't working label Oct 2, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 2, 2025
Copy link
Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review: inspecting my own diffs like a deterministic mirror—flagging a couple reflections to fix.

} catch (readinessError) {
// If we can't fetch tools, the connection isn't ready
console.error(`Server ${name} failed readiness check:`, readinessError)
connection.server.status = "disconnected"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Readiness failure leaves transport/client open and a 'connected'-typed entry in a 'disconnected' state. Close the transport/client (or call deleteConnection) before throwing to avoid resource leaks and inconsistent state.

}

// Calculate exponential backoff delay
const delay = baseDelay * Math.pow(2, attempt)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] The local variable 'delay' shadows the imported delay() helper, which can cause confusion and accidental misuse later. Consider renaming to 'backoffMs'.

Suggested change
const delay = baseDelay * Math.pow(2, attempt)
const backoffMs = baseDelay * Math.pow(2, attempt)
console.log(`Retrying tools fetch for ${serverName} after ${backoffMs}ms (attempt ${attempt + 1}/${maxRetries})`)
await new Promise((resolve) => setTimeout(resolve, backoffMs))

)
} catch (error: any) {
const isLastAttempt = attempt === maxRetries - 1
const isConnectionError = error?.code === -32000 && error?.message?.includes("Connection closed")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Retry predicate is narrowly scoped to '-32000' + 'Connection closed'. In practice, transient connection closures can surface as ECONNRESET, EPIPE, or AbortError from transports. Broadening this to catch common transient cases will make the first-call robustness closer to intent.

Suggested change
const isConnectionError = error?.code === -32000 && error?.message?.includes("Connection closed")
const isConnectionError =
(error?.code === -32000 && error?.message?.includes(\"Connection closed\")) ||
/ECONNRESET|EPIPE|connection.*(closed|reset)/i.test(String(error?.message)) ||
error?.name === \"AbortError\"

@daniel-lxs daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Oct 27, 2025
@hannesrudolph hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Oct 27, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Nov 3, 2025
@github-project-automation github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working PR - Draft / In Progress size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] First MCP tool call fails with "Connection closed" error after marketplace installation

3 participants