Skip to content

InheritableThreadLocal context loss when boundedElastic pool reuses threads created outside request context #704

@lobaorn-bitso

Description

@lobaorn-bitso

Problem

When using the MCP Java SDK's HTTP transport with custom ThreadLocal or InheritableThreadLocal context (e.g., authentication tokens), the context is lost after the boundedElastic thread pool exhausts and starts reusing threads.

Symptoms

Tools that rely on InheritableThreadLocal context work for ~N tool calls (where N = availableProcessors() * 10), then fail because the context is null.

Environment

  • MCP Java SDK 0.16.0
  • HTTP/Streamable transport
  • Reactor (Project Reactor) with boundedElastic scheduler
  • Kubernetes or resource-constrained environment

Root Cause

The issue is a fundamental interaction between:

  1. InheritableThreadLocal - only propagates context when a thread is created, not when it's reused
  2. Reactor's boundedElastic scheduler - pool size = availableProcessors() * 10
  3. KeepAliveScheduler - creates boundedElastic-1 during startup from the main thread (no HTTP/request context)
  4. Tool execution - McpServerFeatures.java uses .subscribeOn(Schedulers.boundedElastic())

Timeline

STARTUP (main thread):
  └─▶ KeepAliveScheduler starts
        └─▶ Schedules on Schedulers.boundedElastic()
              └─▶ Creates boundedElastic-1 ← NO context (parent is main)

REQUEST PHASE:
  └─▶ HTTP filter sets context (e.g., auth token)
        └─▶ Tool execution via .subscribeOn(boundedElastic)
              ├─▶ Creates boundedElastic-2 (inherits) ✅
              ├─▶ Creates boundedElastic-3 (inherits) ✅
              └─▶ ... boundedElastic-N (inherits) ✅

AFTER N TOOL CALLS:
  └─▶ Scheduler REUSES boundedElastic-1
        └─▶ InheritableThreadLocal.get() returns null ❌

Why it works locally but fails in production

Environment CPUs Pool Size Cycles to failure
Local (MacBook) 14 140 ~140 calls
Kubernetes 2 20 ~20 calls

Most local development never reaches 140+ tool calls before restart.

Affected Code

McpServerFeatures.java

BiFunction<...> callHandler = (exchange, req) -> {
    var toolResult = Mono.fromCallable(() ->
        syncToolSpec.callHandler().apply(...));
    // THIS schedules on boundedElastic, potentially reusing context-less threads
    return immediate ? toolResult : toolResult.subscribeOn(Schedulers.boundedElastic());
};

KeepAliveScheduler.java

// Creates first boundedElastic thread BEFORE any HTTP requests
private Scheduler scheduler = Schedulers.boundedElastic();

this.currentSubscription = Flux.interval(this.initialDelay, this.interval, this.scheduler)
    .doOnNext(tick -> { /* keep-alive pings */ })
    .subscribe();

Proposed Solutions

Option 1: Use Micrometer Context Propagation (Recommended)

The SDK should integrate with Micrometer Context Propagation to automatically propagate ThreadLocal values across thread boundaries:

// Enable automatic context propagation
Hooks.enableAutomaticContextPropagation();

Applications can then register their ThreadLocals:

ContextRegistry.getInstance().registerThreadLocalAccessor(
    new ThreadLocalAccessor<MyContext>() {
        @Override public Object key() { return MyContext.class; }
        @Override public MyContext getValue() { return MyContext.current(); }
        @Override public void setValue(MyContext value) { MyContext.set(value); }
        @Override public void reset() { MyContext.clear(); }
    }
);

Option 2: Pass context through Reactor Context

Instead of relying on ThreadLocal, pass values through Reactor's Context:

Mono.deferContextual(ctx -> {
    // Access context values
}).contextWrite(Context.of("key", value));

This requires exposing context access patterns in the SDK API.

Option 3: Dedicated scheduler for KeepAlive

Use a separate scheduler for KeepAliveScheduler that doesn't share threads with tool execution:

private Scheduler keepAliveScheduler = Schedulers.newBoundedElastic(
    Schedulers.DEFAULT_BOUNDED_ELASTIC_SIZE,
    Schedulers.DEFAULT_BOUNDED_ELASTIC_QUEUESIZE,
    "mcp-keepalive"
);

This prevents the "poisoned" thread from being used for tool calls.

Workaround

Applications can increase thread pool sizes via JVM flags:

java -XX:ActiveProcessorCount=10 \
     -Xss256k \
     -Dreactor.schedulers.defaultBoundedElasticSize=1000 \
     -jar app.jar

This delays the problem but doesn't fix it.

References


Happy to submit a PR with a fix if an approach is agreed upon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions