Skip to content

toolhive proxy scuppered by rancher restart #2117

@therealnb

Description

@therealnb

Bug description

I install time and github servers. All is working. If I restart meta-mcp, it still works. If I restart rancher desktop (the same is true for docker desktop), the containers come back and look ok, but they no longer work. It is the MCP handshake that fails.

2025-10-08 09:02:10 [warning  ] Failed to process workload     [meta_mcp.ingestion] error='Operation timed out after 30 seconds' workload_name=time workload_type=mcp
2025-10-08 09:02:10 [warning  ] Failed to process workload     [meta_mcp.ingestion] error='Operation timed out after 30 seconds' workload_name=github workload_type=mcp

Restarting time and github containers does not help. Removing and recreating them does. thv restart does not help.

The logs look fine on time and github.

The Exact Problem

After Docker/Rancher Desktop restarts, ToolHive's SSE proxy enters a broken state where:
✅ It accepts SSE connections
✅ It accepts POST messages to /messages endpoint (returns "Accepted")
❌ It NEVER forwards responses from the MCP server back to the SSE client
The containers themselves work perfectly (verified by stdio logs showing proper MCP responses), but the SSE-to-stdio bridge is one-way broken.
Evidence:
Test script proves message is accepted but no response comes back on SSE stream
Container logs show servers responding correctly to stdio
Proxy processes are running with valid file descriptors
Restarting containers doesn't fix it
Root Cause: The SSE read loop in ToolHive's HTTP proxy stops forwarding stdio output from the container back to SSE clients after Docker restart. The proxy needs to be restarted or fixed to re-establish the bidirectional bridge.

Steps to reproduce

Here are the key commands I ran to demonstrate the bug:

1. Test Script (Most Important)

# Created and ran a test script that demonstrates the broken SSE flow
/Users/nigel/code/meta-mcp/test_sse.sh

test_sse.sh

This script shows:

  • SSE connection opens ✅
  • session_id is received ✅
  • Initialize message is sent and accepted ✅
  • NO response comes back on SSE stream

2. Manual SSE Endpoint Test

# Test if SSE endpoint responds (it does, but hangs after first message)
curl -s --max-time 3 http://127.0.0.1:40962/sse

Result: Returns event: endpoint with session_id, then times out

3. Messages Endpoint Test

# Test if messages endpoint accepts requests (it does!)
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test-client","version":"1.0.0"}}}' \
  "http://127.0.0.1:40962/messages?session_id=test-123"

Result: Returns "Accepted" (but response never appears on SSE stream)

4. Verify Containers Are Working

# Check github container logs - shows it's responding correctly
cd /Users/nigel/code/toolhive && thv logs github 2>&1 | tail -20

# Check time container logs - shows proper MCP responses
cd /Users/nigel/code/toolhive && thv logs time 2>&1 | tail -20

Result: Both show full tools lists and proper MCP protocol responses

5. Check Container Status

# Verify containers are running
cd /Users/nigel/code/toolhive && thv ls

# Check when containers were created
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.CreatedAt}}"

# Check container stdio connections
docker exec time sh -c "ls -l /proc/1/fd/" | head -5

Result: Containers are healthy, stdio pipes are connected

6. Check Proxy Process State

# Find proxy processes
ps aux | grep "thv restart" | grep -E "(github|time)" | grep -v grep

# Check proxy file descriptors
lsof -p 82050 | grep -E "(PIPE|unix|tcp)"  # github proxy
lsof -p 82453 | grep -E "(PIPE|unix|tcp)"  # time proxy

Result: Proxy processes are running with PIPE file descriptors

7. Check meta-mcp Logs

# See meta-mcp timing out trying to connect
cd /Users/nigel/code/toolhive && thv logs meta-mcp 2>&1 | grep -E "(timeout|failed|tools_count)" | tail -10

Result: Shows "Operation timed out after 30 seconds" for both github and time


The Smoking Gun Command

The test_sse.sh script is the clearest demonstration. Here's what it does:

#!/bin/bash
URL="http://127.0.0.1:40962"

# 1. Open SSE connection and get session_id
curl -s -N "$URL/sse" > /tmp/sse_output.txt &
sleep 2
SESSION_ID=$(grep "session_id" /tmp/sse_output.txt | sed 's/.*session_id=\([^&[:space:]]*\).*/\1/')

# 2. Send initialize message
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize",...}' \
  "$URL/messages?session_id=$SESSION_ID"

# 3. Check SSE stream - NOTHING comes back!
sleep 2
cat /tmp/sse_output.txt  # Shows only 'endpoint' event, no initialize response

The key insight: Messages go IN (accepted), but responses never come OUT on the SSE stream, even though the container is responding correctly on stdio.

Expected behavior

I expect the proxy in the MCP servers to work after a restart of the container system.

Actual behavior

It didn't.

Environment (if relevant)

  • OS/version: Mac
  • ToolHive version: v0.3.7-4-ge49a77f9

Additional context

Any additional information or logs you think might help.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcliChanges that impact CLI functionalityproxy

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions