-
Notifications
You must be signed in to change notification settings - Fork 750
Description
Problem
When using the MCP Java SDK's HTTP transport with custom ThreadLocal or InheritableThreadLocal context (e.g., authentication tokens), the context is lost after the boundedElastic thread pool exhausts and starts reusing threads.
Symptoms
Tools that rely on InheritableThreadLocal context work for ~N tool calls (where N = availableProcessors() * 10), then fail because the context is null.
Environment
- MCP Java SDK 0.16.0
- HTTP/Streamable transport
- Reactor (Project Reactor) with boundedElastic scheduler
- Kubernetes or resource-constrained environment
Root Cause
The issue is a fundamental interaction between:
InheritableThreadLocal- only propagates context when a thread is created, not when it's reused- Reactor's
boundedElasticscheduler - pool size =availableProcessors() * 10 KeepAliveScheduler- createsboundedElastic-1during startup from the main thread (no HTTP/request context)- Tool execution -
McpServerFeatures.javauses.subscribeOn(Schedulers.boundedElastic())
Timeline
STARTUP (main thread):
└─▶ KeepAliveScheduler starts
└─▶ Schedules on Schedulers.boundedElastic()
└─▶ Creates boundedElastic-1 ← NO context (parent is main)
REQUEST PHASE:
└─▶ HTTP filter sets context (e.g., auth token)
└─▶ Tool execution via .subscribeOn(boundedElastic)
├─▶ Creates boundedElastic-2 (inherits) ✅
├─▶ Creates boundedElastic-3 (inherits) ✅
└─▶ ... boundedElastic-N (inherits) ✅
AFTER N TOOL CALLS:
└─▶ Scheduler REUSES boundedElastic-1
└─▶ InheritableThreadLocal.get() returns null ❌
Why it works locally but fails in production
| Environment | CPUs | Pool Size | Cycles to failure |
|---|---|---|---|
| Local (MacBook) | 14 | 140 | ~140 calls |
| Kubernetes | 2 | 20 | ~20 calls |
Most local development never reaches 140+ tool calls before restart.
Affected Code
McpServerFeatures.java
BiFunction<...> callHandler = (exchange, req) -> {
var toolResult = Mono.fromCallable(() ->
syncToolSpec.callHandler().apply(...));
// THIS schedules on boundedElastic, potentially reusing context-less threads
return immediate ? toolResult : toolResult.subscribeOn(Schedulers.boundedElastic());
};KeepAliveScheduler.java
// Creates first boundedElastic thread BEFORE any HTTP requests
private Scheduler scheduler = Schedulers.boundedElastic();
this.currentSubscription = Flux.interval(this.initialDelay, this.interval, this.scheduler)
.doOnNext(tick -> { /* keep-alive pings */ })
.subscribe();Proposed Solutions
Option 1: Use Micrometer Context Propagation (Recommended)
The SDK should integrate with Micrometer Context Propagation to automatically propagate ThreadLocal values across thread boundaries:
// Enable automatic context propagation
Hooks.enableAutomaticContextPropagation();Applications can then register their ThreadLocals:
ContextRegistry.getInstance().registerThreadLocalAccessor(
new ThreadLocalAccessor<MyContext>() {
@Override public Object key() { return MyContext.class; }
@Override public MyContext getValue() { return MyContext.current(); }
@Override public void setValue(MyContext value) { MyContext.set(value); }
@Override public void reset() { MyContext.clear(); }
}
);Option 2: Pass context through Reactor Context
Instead of relying on ThreadLocal, pass values through Reactor's Context:
Mono.deferContextual(ctx -> {
// Access context values
}).contextWrite(Context.of("key", value));This requires exposing context access patterns in the SDK API.
Option 3: Dedicated scheduler for KeepAlive
Use a separate scheduler for KeepAliveScheduler that doesn't share threads with tool execution:
private Scheduler keepAliveScheduler = Schedulers.newBoundedElastic(
Schedulers.DEFAULT_BOUNDED_ELASTIC_SIZE,
Schedulers.DEFAULT_BOUNDED_ELASTIC_QUEUESIZE,
"mcp-keepalive"
);This prevents the "poisoned" thread from being used for tool calls.
Workaround
Applications can increase thread pool sizes via JVM flags:
java -XX:ActiveProcessorCount=10 \
-Xss256k \
-Dreactor.schedulers.defaultBoundedElasticSize=1000 \
-jar app.jarThis delays the problem but doesn't fix it.
References
Happy to submit a PR with a fix if an approach is agreed upon.