Skip to content

Prune idle sessions before starting new ones #701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

halter73
Copy link
Contributor

@halter73 halter73 commented Aug 12, 2025

This PR replaces #677. The key difference between this PR and the old one is that extra idle sessions are closed immediately before starting new ones instead of waiting for the IdleTrackingBackgroundService to clean up the next time the periodic timer fires. To achieve this, I made the ConcurrentDictionary tracking the stateful StreamableHttpSessions a singleton service that's also responsible for pruning idle sessions. StreamableHttpSession was previously named HttpMcpSession<TTransport>, but it's no longer shared with the SseHandler

You can look at the description of #677 to see the consequences of creating too many new sessions without first closing and unrooting a corresponding number of idle sessions. The tl;dr is that overallocating could lead to thread pool starvation as hundreds of threads had to wait on the GC to allocate heap space. This thread pool starvation created a vicious cycle because it prevented the IdleTrackingBackgroundService from unrooting the idle sessions causing more of them to get promoted and creating more work for the GC.

In order to reduce contention, I reuse the sorted _idleSessionIds list to find the most idle session to remove next. This list only gets repopulated every 5 seconds on a background loop, or if we run out of new idle sessions to close to make room for new ones. This isn't perfect, because sessions may briefly become active while siting in the _idleSessionIds list, but not get resorted. However, this is only a problem if the server is at the MaxIdleSessionCount, and that every idle session that was less recently active during the last sort has already been closed. Considering a sort should happen at least every 5 seconds when sessions are pruned, I think this is a fair tradeoff to reduce global synchronization on session creation (at least when under the MaxIdleSessionCount) and every time a session becomes idle.

In my testing on a with 16core/64GB VM, a 100,000 idle session limit (the old default) caused the server process to consume 2-3 GB of memory according to Task Manager and limited new session creation rate to about 60 sessions/second after reaching the MaxIdleSessionCount. At a 10,000 idle session limit (the new default), the process memory usage dropped to about 300MB session creation rate increased to about 900 sessions/second. And at the even lower 1,000 idle session limit, the process memory usage dropped further to about 180MB the session creation rate increased again to about 5,000 sessions/second. All of these numbers are stable over repeated runs after having reached the MaxIdleSessionCount.

MaxIdleSessionCount = 10,000 (New default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ -s scripts/mcp.lua
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   260.42ms   24.53ms 342.05ms   77.48%
    Req/Sec    31.61     11.99   110.00     83.07%
  14737 requests in 15.07s, 5.04MB read
Requests/sec:    977.96
Transfer/sec:    342.43KB

MaxIdleSessionCount = 100,000 (Old default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ -s scripts/mcp.lua --timeout 15s
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.76s     1.68s    7.47s    56.71%
    Req/Sec     9.08     13.85    79.00     89.39%
  917 requests in 15.05s, 321.01KB read
Requests/sec:     60.92
Transfer/sec:     21.33KB

MaxIdleSessionCount = 1,000 (Lower than default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ -s scripts/mcp.lua
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    46.52ms    9.40ms 127.71ms   85.46%
    Req/Sec   172.64     31.54   574.00     76.85%
  82981 requests in 15.08s, 28.38MB read
Requests/sec:   5501.70
Transfer/sec:      1.88MB

Single Session Tool Call

The MaxIdleSessionCount has no apparent affect on this test, and I wouldn't expect it to, since we still look up existing sessions the same way we did previously.

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ --timeout=15s -s scripts/mcp.lua <session-id>
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.31ms   14.79ms 370.81ms   96.89%
    Req/Sec     1.05k   179.55     3.23k    78.45%
  503104 requests in 15.10s, 172.05MB read
Requests/sec:  33319.09
Transfer/sec:     11.39MB

return;
}

await _idlePruningLock.WaitAsync(cancellationToken);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's seemingly a ton of work happening under this pruning lock. Would it work to just have the lock be taken long enough to pick and remove a session, releasing the lock before doing more work like tearing down or using its resources? We'd still loop, just outside of the lock rather than inside, under the assumption that pruning attempts will be successful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hope is that typical applications rarely have to enter this lock. And when it is entered, we're staying in it extra-long intentionally by awaiting DisposeSessionAsync(), so we don't allow the creation of new sessions while a bunch of sessions are still rooted. That's why I left the following comment:

// We're intentionally waiting for the idle session to be disposed before releasing the _idlePruningLock to
// ensure new sessions are not created faster than they're removed when we're at or above the maximum idle session count.

If we really wanted to, we could use Monitor.Enter, and move the call to DisposeSessionAsync outside the lock. I'm not sure what you mean by looping outside the lock, but again I don't really see the point of optimizing this path too much when we know due to the _currentIdleSessionCount, that session disposal is not keeping up with session creation. Waiting on the semaphore is the most resource efficient thing we can do with these requests until we catch up with disposals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I see the issue here. We don't want to limit the parallelism of disposals to only when the background timer gets a hold of the semaphore. Having every request that tries to start a new session while at/over the idle limit wait on at least one session getting disposed should be sufficient to slow down session creation without unnecessarily serializing the disposals.

Copy link
Contributor Author

@halter73 halter73 Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the update to dispose outside of the lock, and to acquire the lock inside the outer loop. I reran the benchmarks and it shows about a 10% improvement in session creation when the idle sessions are saturated at the new default 10,000 limit, and over a 100% improvement at the lower 1,000 limit.

There's a consistent increase in variance which is noticeable run-to-run at higher limits and is shown by the stdev measurement stat at every limit, but that makes sense considering we're not waiting in an orderly queue to start a new session and instead have to wait for the tread pool to give priority. It also looks slightly worse at 100,000 and 200,000. But that's now over 10x the default limit, and it's not worse by much.

MaxIdleSessionCount = 10,000 (New default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ --timeout=15s -s scripts/mcp.lua
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   242.49ms  145.27ms   1.89s    80.41%
    Req/Sec    37.31     23.68   520.00     76.12%
  16503 requests in 15.04s, 5.64MB read
Requests/sec:   1097.16
Transfer/sec:    384.15KB

MaxIdleSessionCount = 100,000 (Old default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ --timeout=15s -s scripts/mcp.lua
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.65s     2.81s   13.59s    67.60%
    Req/Sec     2.55      5.87   110.00     92.57%
  791 requests in 15.09s, 276.81KB read
Requests/sec:     52.41
Transfer/sec:     18.34KB

MaxIdleSessionCount = 1,000 (Lower than default)

$ ./wrk -t32 -c256 -d15s http://172.20.240.1:3001/ --timeout=15s -s scripts/mcp.lua
Running 15s test @ http://172.20.240.1:3001/
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    21.68ms   13.58ms 245.56ms   89.03%
    Req/Sec   391.13     93.53     2.20k    84.63%
  187993 requests in 15.10s, 64.29MB read
Requests/sec:  12449.69
Transfer/sec:      4.26MB

The main thing is disposing pruned sessions outside the lock. This
improved session creation performance when at max idle sessions
by about 10% at a 10,000 idle session max and 100% at a 1,000 max
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants