InheritableThreadLocal context loss when boundedElastic pool reuses threads created outside request context

## Problem

When using the MCP Java SDK's HTTP transport with custom `ThreadLocal` or `InheritableThreadLocal` context (e.g., authentication tokens), the context is lost after the `boundedElastic` thread pool exhausts and starts reusing threads.

### Symptoms

Tools that rely on `InheritableThreadLocal` context work for ~N tool calls (where N = `availableProcessors() * 10`), then fail because the context is `null`.

### Environment

- MCP Java SDK 0.16.0
- HTTP/Streamable transport
- Reactor (Project Reactor) with boundedElastic scheduler
- Kubernetes or resource-constrained environment

## Root Cause

The issue is a fundamental interaction between:

1. `InheritableThreadLocal` - only propagates context when a thread is created, not when it's reused
2. Reactor's `boundedElastic` scheduler - pool size = `availableProcessors() * 10`
3. `KeepAliveScheduler` - creates `boundedElastic-1` during startup from the main thread (no HTTP/request context)
4. Tool execution - `McpServerFeatures.java` uses `.subscribeOn(Schedulers.boundedElastic())`

### Timeline

```
STARTUP (main thread):
  └─▶ KeepAliveScheduler starts
        └─▶ Schedules on Schedulers.boundedElastic()
              └─▶ Creates boundedElastic-1 ← NO context (parent is main)

REQUEST PHASE:
  └─▶ HTTP filter sets context (e.g., auth token)
        └─▶ Tool execution via .subscribeOn(boundedElastic)
              ├─▶ Creates boundedElastic-2 (inherits) ✅
              ├─▶ Creates boundedElastic-3 (inherits) ✅
              └─▶ ... boundedElastic-N (inherits) ✅

AFTER N TOOL CALLS:
  └─▶ Scheduler REUSES boundedElastic-1
        └─▶ InheritableThreadLocal.get() returns null ❌
```

### Why it works locally but fails in production

| Environment         | CPUs | Pool Size | Cycles to failure |
|---------------------|------|-----------|-------------------|
| Local (MacBook)     | 14   | 140       | ~140 calls        |
| Kubernetes          | 2    | 20        | ~20 calls         |

Most local development never reaches 140+ tool calls before restart.

## Affected Code

### `McpServerFeatures.java`

```java
BiFunction<...> callHandler = (exchange, req) -> {
    var toolResult = Mono.fromCallable(() ->
        syncToolSpec.callHandler().apply(...));
    // THIS schedules on boundedElastic, potentially reusing context-less threads
    return immediate ? toolResult : toolResult.subscribeOn(Schedulers.boundedElastic());
};
```

### `KeepAliveScheduler.java`

```java
// Creates first boundedElastic thread BEFORE any HTTP requests
private Scheduler scheduler = Schedulers.boundedElastic();

this.currentSubscription = Flux.interval(this.initialDelay, this.interval, this.scheduler)
    .doOnNext(tick -> { /* keep-alive pings */ })
    .subscribe();
```

## Proposed Solutions

### Option 1: Use Micrometer Context Propagation (Recommended)

The SDK should integrate with Micrometer Context Propagation to automatically propagate `ThreadLocal` values across thread boundaries:

```java
// Enable automatic context propagation
Hooks.enableAutomaticContextPropagation();
```

Applications can then register their ThreadLocals:

```java
ContextRegistry.getInstance().registerThreadLocalAccessor(
    new ThreadLocalAccessor<MyContext>() {
        @Override public Object key() { return MyContext.class; }
        @Override public MyContext getValue() { return MyContext.current(); }
        @Override public void setValue(MyContext value) { MyContext.set(value); }
        @Override public void reset() { MyContext.clear(); }
    }
);
```

### Option 2: Pass context through Reactor Context

Instead of relying on `ThreadLocal`, pass values through Reactor's `Context`:

```java
Mono.deferContextual(ctx -> {
    // Access context values
}).contextWrite(Context.of("key", value));
```

This requires exposing context access patterns in the SDK API.

### Option 3: Dedicated scheduler for KeepAlive

Use a separate scheduler for `KeepAliveScheduler` that doesn't share threads with tool execution:

```java
private Scheduler keepAliveScheduler = Schedulers.newBoundedElastic(
    Schedulers.DEFAULT_BOUNDED_ELASTIC_SIZE,
    Schedulers.DEFAULT_BOUNDED_ELASTIC_QUEUESIZE,
    "mcp-keepalive"
);
```

This prevents the "poisoned" thread from being used for tool calls.

## Workaround

Applications can increase thread pool sizes via JVM flags:

```bash
java -XX:ActiveProcessorCount=10 \
     -Xss256k \
     -Dreactor.schedulers.defaultBoundedElasticSize=1000 \
     -jar app.jar
```

This delays the problem but doesn't fix it.

## References

- [Reactor Context](https://projectreactor.io/docs/core/release/reference/#context)
- [Micrometer Context Propagation](https://micrometer.io/docs/contextPropagation)
- [InheritableThreadLocal limitations](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/InheritableThreadLocal.html)

---

Happy to submit a PR with a fix if an approach is agreed upon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

InheritableThreadLocal context loss when boundedElastic pool reuses threads created outside request context #704

Problem

Symptoms

Environment

Root Cause

Timeline

Why it works locally but fails in production

Affected Code

`McpServerFeatures.java`

`KeepAliveScheduler.java`

Proposed Solutions

Option 1: Use Micrometer Context Propagation (Recommended)

Option 2: Pass context through Reactor Context

Option 3: Dedicated scheduler for KeepAlive

Workaround

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

InheritableThreadLocal context loss when boundedElastic pool reuses threads created outside request context #704

Description

Problem

Symptoms

Environment

Root Cause

Timeline

Why it works locally but fails in production

Affected Code

McpServerFeatures.java

KeepAliveScheduler.java

Proposed Solutions

Option 1: Use Micrometer Context Propagation (Recommended)

Option 2: Pass context through Reactor Context

Option 3: Dedicated scheduler for KeepAlive

Workaround

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`McpServerFeatures.java`

`KeepAliveScheduler.java`