Prune idle sessions before starting new ones #701

halter73 · 2025-08-12T20:43:14Z

This PR replaces #677. The key difference between this PR and the old one is that extra idle sessions are closed immediately before starting new ones instead of waiting for the IdleTrackingBackgroundService to clean up the next time the periodic timer fires. To achieve this, I made the ConcurrentDictionary tracking the stateful StreamableHttpSessions a singleton service that's also responsible for pruning idle sessions. StreamableHttpSession was previously named HttpMcpSession<TTransport>, but it's no longer shared with the SseHandler

You can look at the description of #677 to see the consequences of creating too many new sessions without first closing and unrooting a corresponding number of idle sessions. The tl;dr is that overallocating could lead to thread pool starvation as hundreds of threads had to wait on the GC to allocate heap space. This thread pool starvation created a vicious cycle because it prevented the IdleTrackingBackgroundService from unrooting the idle sessions causing more of them to get promoted and creating more work for the GC.

In order to reduce contention, I reuse the sorted _idleSessionIds list to find the most idle session to remove next. This list only gets repopulated every 5 seconds on a background loop, or if we run out of new idle sessions to close to make room for new ones. This isn't perfect, because sessions may briefly become active while siting in the _idleSessionIds list, but not get resorted. However, this is only a problem if the server is at the MaxIdleSessionCount, and that every idle session that was less recently active during the last sort has already been closed. Considering a sort should happen at least every 5 seconds when sessions are pruned, I think this is a fair tradeoff to reduce global synchronization on session creation (at least when under the MaxIdleSessionCount) and every time a session becomes idle.

In my testing on a with 16core/64GB VM, a 100,000 idle session limit (the old default) caused the server process to consume 2-3 GB of memory according to Task Manager and limited new session creation rate to about 60 sessions/second after reaching the MaxIdleSessionCount. At a 10,000 idle session limit (the new default), the process memory usage dropped to about 300MB session creation rate increased to about 900 sessions/second. And at the even lower 1,000 idle session limit, the process memory usage dropped further to about 180MB the session creation rate increased again to about 5,000 sessions/second. All of these numbers are stable over repeated runs after having reached the MaxIdleSessionCount.

MaxIdleSessionCount = 10,000 (New default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ -s scripts/mcp.lua
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   260.42ms   24.53ms 342.05ms   77.48%
    Req/Sec    31.61     11.99   110.00     83.07%
  14737 requests in 15.07s, 5.04MB read
Requests/sec:    977.96
Transfer/sec:    342.43KB

MaxIdleSessionCount = 100,000 (Old default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ -s scripts/mcp.lua --timeout 15s
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.76s     1.68s    7.47s    56.71%
    Req/Sec     9.08     13.85    79.00     89.39%
  917 requests in 15.05s, 321.01KB read
Requests/sec:     60.92
Transfer/sec:     21.33KB

MaxIdleSessionCount = 1,000 (Lower than default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ -s scripts/mcp.lua
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    46.52ms    9.40ms 127.71ms   85.46%
    Req/Sec   172.64     31.54   574.00     76.85%
  82981 requests in 15.08s, 28.38MB read
Requests/sec:   5501.70
Transfer/sec:      1.88MB

Single Session Tool Call

The MaxIdleSessionCount has no apparent affect on this test, and I wouldn't expect it to, since we still look up existing sessions the same way we did previously.

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ --timeout=15s -s scripts/mcp.lua <session-id>
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.31ms   14.79ms 370.81ms   96.89%
    Req/Sec     1.05k   179.55     3.23k    78.45%
  503104 requests in 15.10s, 172.05MB read
Requests/sec:  33319.09
Transfer/sec:     11.39MB

src/ModelContextProtocol.AspNetCore/HttpServerTransportOptions.cs

Co-authored-by: Meir Blachman <[email protected]>

src/ModelContextProtocol.AspNetCore/IdleTrackingBackgroundService.cs

src/ModelContextProtocol.AspNetCore/StatefulSessionManager.cs

stephentoub · 2025-08-14T18:49:22Z

src/ModelContextProtocol.AspNetCore/StatefulSessionManager.cs

+            return;
+        }
+
+        await _idlePruningLock.WaitAsync(cancellationToken);


There's seemingly a ton of work happening under this pruning lock. Would it work to just have the lock be taken long enough to pick and remove a session, releasing the lock before doing more work like tearing down or using its resources? We'd still loop, just outside of the lock rather than inside, under the assumption that pruning attempts will be successful.

The hope is that typical applications rarely have to enter this lock. And when it is entered, we're staying in it extra-long intentionally by awaiting DisposeSessionAsync(), so we don't allow the creation of new sessions while a bunch of sessions are still rooted. That's why I left the following comment:

// We're intentionally waiting for the idle session to be disposed before releasing the _idlePruningLock to // ensure new sessions are not created faster than they're removed when we're at or above the maximum idle session count.

If we really wanted to, we could use Monitor.Enter, and move the call to DisposeSessionAsync outside the lock. I'm not sure what you mean by looping outside the lock, but again I don't really see the point of optimizing this path too much when we know due to the _currentIdleSessionCount, that session disposal is not keeping up with session creation. Waiting on the semaphore is the most resource efficient thing we can do with these requests until we catch up with disposals.

On second thought, I see the issue here. We don't want to limit the parallelism of disposals to only when the background timer gets a hold of the semaphore. Having every request that tries to start a new session while at/over the idle limit wait on at least one session getting disposed should be sufficient to slow down session creation without unnecessarily serializing the disposals.

I made the update to dispose outside of the lock, and to acquire the lock inside the outer loop. I reran the benchmarks and it shows about a 10% improvement in session creation when the idle sessions are saturated at the new default 10,000 limit, and over a 100% improvement at the lower 1,000 limit.

There's a consistent increase in variance which is noticeable run-to-run at higher limits and is shown by the stdev measurement stat at every limit, but that makes sense considering we're not waiting in an orderly queue to start a new session and instead have to wait for the tread pool to give priority. It also looks slightly worse at 100,000 and 200,000. But that's now over 10x the default limit, and it's not worse by much.

MaxIdleSessionCount = 10,000 (New default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ --timeout=15s -s scripts/mcp.lua Running 15s test @ http://172.20.240.1:3001/ 32 threads and 256 connections Thread Stats Avg Stdev Max +/- Stdev Latency 242.49ms 145.27ms 1.89s 80.41% Req/Sec 37.31 23.68 520.00 76.12% 16503 requests in 15.04s, 5.64MB read Requests/sec: 1097.16 Transfer/sec: 384.15KB

MaxIdleSessionCount = 100,000 (Old default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ --timeout=15s -s scripts/mcp.lua Running 15s test @ http://172.20.240.1:3001/ 32 threads and 256 connections Thread Stats Avg Stdev Max +/- Stdev Latency 3.65s 2.81s 13.59s 67.60% Req/Sec 2.55 5.87 110.00 92.57% 791 requests in 15.09s, 276.81KB read Requests/sec: 52.41 Transfer/sec: 18.34KB

MaxIdleSessionCount = 1,000 (Lower than default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ --timeout=15s -s scripts/mcp.lua Running 15s test @ http://172.20.240.1:3001/ 32 threads and 256 connections Thread Stats Avg Stdev Max +/- Stdev Latency 21.68ms 13.58ms 245.56ms 89.03% Req/Sec 391.13 93.53 2.20k 84.63% 187993 requests in 15.10s, 64.29MB read Requests/sec: 12449.69 Transfer/sec: 4.26MB

src/ModelContextProtocol.AspNetCore/StatefulSessionManager.cs

The main thing is disposing pruned sessions outside the lock. This improved session creation performance when at max idle sessions by about 10% at a 10,000 idle session max and 100% at a 1,000 max

halter73 added 4 commits August 11, 2025 14:40

Stop using HttpMcpSession in SseHandler

5990525

Prune idle sessions before starting new ones

f50abd8

Call LogTransportReceivedMessageSensitive in HTTP client transports

0227265

Remove HttpClientExt from ProtectedMcpServer sample

2443cd1

halter73 requested a review from stephentoub August 12, 2025 20:43

halter73 mentioned this pull request Aug 12, 2025

Add backpressure when rapidly creating new stateful Streamable HTTP sessions without closing them #677

Closed

Meir017 reviewed Aug 12, 2025

View reviewed changes

src/ModelContextProtocol.AspNetCore/HttpServerTransportOptions.cs Outdated Show resolved Hide resolved

Update src/ModelContextProtocol.AspNetCore/HttpServerTransportOptions.cs

e6894e4

Co-authored-by: Meir Blachman <[email protected]>