Skip to content

Conversation

wreed4
Copy link

@wreed4 wreed4 commented May 29, 2025

  • added change and formatted with ruff
  • added tests

Dispatch _received_request in asynchronous tasks inside the session's _receive_loop.

Motivation and Context

When writing mcp servers that have the potential to return large amounts of data, one workable pattern is to "map/reduce" the results by chunking up the backend response and summarizing it with an LLM, then combining the summaries before returning that combined summary as the tool response. Sampling is the perfect tool for this, but it is locked into a sequential execution. Meaning if I break my data up into 10 chunks I have to sequentially summarize all of those results before my tool can respond. This can lead to very long runtimes which is not necessary since each sampling call only needs its own data.

This change should allow much more efficient "map/reduce" using sampling from MCP Servers (without them implementing their own LLM integration server-side).

How Has This Been Tested?

I've tested this with fast-agent which is one of the only mcp clients that implement sampling. It greatly speeds up my applications.

Breaking Changes

No. Unless a client sends sampling requests concurrently (vs immediately awaiting which is more standard), the behavior will not change.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally (There is one existing test that does not pass currently regarding OAuth, but it doesn't seem related. will see if fails in this PR and continue debugging)
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

@wreed4
Copy link
Author

wreed4 commented May 29, 2025

the one failing test in the one case, I believe fails due to a race condition. When adding sleeps into the test to try to force order of execution, it reliably fails even without my change as far as I can tell. Will continue to see if I can figure it out, but from what I can tell, the second "await session.send_request" returns without ever triggering the message handler. Removing the sleeps allows it to succeed every time on my machine, but adding the sleeps shows that session.send_request is returning before the message handler is called.

@wreed4
Copy link
Author

wreed4 commented May 29, 2025

OKAY! I've tracked down the test_streamablehettp_client_resumption failure to the following behavior (seemingly)...

General layout of the test:

  • create a tool called "long_running_with_checkpoints" which we are intending to break our connection with before it finishes (and resume it later).
  • register a message handler which is called from shared/session.py::_receive_loop by calling self._handle_incoming which calls the client/session.py version of that function which calls our message handler
  • Create a client
  • Start the tool
  • Wait for it to send the first notification and then disconnect
  • THE TOOL WILL CONTINUE RUNNING

then, the intended flow seems to be

  • reconnect to the tool
  • pick up the remaining notifications it sent us
  • profit

However, this only seems to work correctly if the tool has sent another notification before we reconnect to it. If the tool has yet to send another notification, the call to send_request hangs forever.

This is, at this point, very outside the scope of my change I'm trying to make, as I've verified that this happens both with and without my change.. But in the spirit of not breaking everyone else, I'll try to fix this as well. If folks want to keep this out and put it in another bug, that works for me too. I can revert whatever I do to these files as they're unrelated to the change I wanted to present.

@wreed4
Copy link
Author

wreed4 commented May 30, 2025

I've tracked this down as far as I can and determined, ultimately it should be out of scope of this PR even if I could find out what's happening.. which I haven't been successful in. So I've added a bit more logic to make the test more reliable and opened another issue to capture this error case.

@wreed4
Copy link
Author

wreed4 commented May 30, 2025

@ihrpr ihrpr added this to 🐛 🛠 Jun 26, 2025
@github-project-automation github-project-automation bot moved this to To triage in 🐛 🛠 Jun 26, 2025
gspencergoog pushed a commit to gspencergoog/mcp-python-sdk that referenced this pull request Jul 29, 2025
gspencergoog pushed a commit to gspencergoog/mcp-python-sdk that referenced this pull request Jul 29, 2025
@felixweinberger felixweinberger added enhancement New feature or request needs more eyes Needs alignment among maintainers whether this is something we want to add labels Sep 8, 2025
Copy link
Contributor

@felixweinberger felixweinberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wreed4 thank you for this contribution! And apologies for the time it took to get back to this.

Handling these concurrently seems like an interesting idea but not something we currently specify in the protocol + introduces additional complexity where we want to be sure it's worthwhile adding. The current implementation also seems to lead to duplicate request handling on every request currently.

Given how we currently have an ongoing SEP for async tool execution (see: modelcontextprotocol/modelcontextprotocol#1391), I believe this would fall in the same category - we'd want to elevate this to a protocol level change and not add it ad-hoc as a feature to one (but not other) SDKs.

We would likely need a SEP for this to also ensure other SDKs support this.

async def _handle_received_request() -> None:
await self._received_request(responder)
if not responder._completed: # type: ignore[reportPrivateUsage]
await self._handle_incoming(responder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is duplicating line 366 below so we might be calling self._handle_incoming twice for every request? Should we be removing 365-366?

@felixweinberger
Copy link
Contributor

Hi @wreed4 thank you for this contribution! And apologies for the time it took to get back to this.

Handling these concurrently seems like an interesting idea but not something we currently specify in the protocol + introduces additional complexity where we want to be sure it's worthwhile adding. The current implementation also seems to lead to duplicate request handling on every request currently.

Given how we currently have an ongoing SEP for async tool execution (see: modelcontextprotocol/modelcontextprotocol#1391), I believe this would fall in the same category - we'd want to elevate this to a protocol level change and not add it ad-hoc as a feature to one (but not other) SDKs.

We would likely need a SEP for this to also ensure other SDKs support this.

Closing for now as discussed here this should likely be a SEP first.

@github-project-automation github-project-automation bot moved this from To triage to Done in 🐛 🛠 Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needs more eyes Needs alignment among maintainers whether this is something we want to add

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants