Fix MCP tool-call latency for non-queued events by Mandark-droid · Pull Request #12961 · gradio-app/gradio

Mandark-droid · 2026-03-04T05:16:11Z

Description

When an MCP call_tool() is invoked for a non-queued event (queue=False), the current implementation still routes the call through gradio_client.Client.submit(), which performs a full HTTP loopback:

MCP request
  → run_sync(Client._get_or_create_client)   ← thread dispatch
  → client.submit(api_name=endpoint)          ← HTTP POST to own server
  → Gradio queue processing
  → run_sync(job.result)                      ← thread blocking for HTTP response

This adds ~4 seconds of overhead per call for functions that take ~13ms to execute.

This PR bypasses the HTTP loopback for queue=False events by calling blocks.process_api() directly — the same internal function the HTTP route eventually reaches. For queued events (queue=True), the existing path is preserved to maintain streaming updates, progress notifications, and queue-based features.

Relates to: #11961 (PR #12296 partially addressed this by skipping progress updates, but the HTTP loopback remained)

AI Disclosure

I used AI to assist with benchmarking analysis and drafting the PR description

Benchmark Results

Benchmarked using mcp-server-bench — an open-source MCP benchmarking tool comparing Gradio vs FastMCP across identical tool implementations.

Three-Way Comparison: Gradio MCP Streamable Protocol

Scenario	Before (loopback)	PR (queue=OFF)	PR (queue=ON)	FastMCP ref
echo VU=1	0.4 RPS / 4,133ms p50	54.3 RPS / 16ms p50	0.0 RPS (startup issue)	74.6 RPS / 13ms p50
echo VU=10	3.6 RPS / 4,133ms p50	151.4 RPS / 63ms p50	2.8 RPS / 4,123ms p50	43.6 RPS / 203ms p50
async_sleep VU=1	0.4 RPS / 4,141ms p50	16.6 RPS / 63ms p50	0.3 RPS / 4,149ms p50	13.9 RPS / 77ms p50
async_sleep VU=10	3.6 RPS / 4,084ms p50	126.6 RPS / 79ms p50	2.8 RPS / 4,129ms p50	52.9 RPS / 168ms p50

Format: Throughput (RPS) / p50 latency. VU = virtual users (concurrent connections).

Key Results

queue=OFF (this PR): p50 latency drops from ~4,130ms to ~16-79ms (50-250x improvement)
queue=ON (unchanged): Behavior identical to before — streaming/progress preserved
At VU=10 with queue=False, Gradio beats FastMCP on both throughput and latency (151 vs 44 RPS for echo, 127 vs 53 RPS for async_sleep)

Benchmark Datasets (reproducible)

Dataset	Description
mcp-server-bench	Before fix — 360 scenarios, all HTTP loopback
mcp-server-bench-gradio-optimized	After fix (unconditional direct call) — 48 scenarios
mcp-server-bench-gradio-optimized-full-bench	After fix (unconditional direct call) — 337 scenarios, full benchmark
mcp-server-bench-gradio	This PR (conditional, queue=False only) — 12 scenarios

Testing and Formatting

Validated with the benchmark suite above. The change is scoped to gradio/mcp.py only — no frontend changes.

Changes

1 file changed: gradio/mcp.py
Added from gradio.state_holder import SessionState
call_tool(): when block_fn.queue is False, call blocks.process_api() directly; otherwise use existing HTTP loopback path

…pback When `queue=False`, MCP `call_tool()` now calls `blocks.process_api()` directly instead of going through `gradio_client.Client.submit()`, which was making an HTTP POST back to the same server. This eliminates thread dispatches, TCP round-trips, SSE overhead, and queue serialization. For queued events (`queue=True`), the existing HTTP loopback path is preserved to maintain streaming updates, progress notifications, and queue-based features.

gradio-pr-bot · 2026-03-04T14:07:04Z

🪼 branch checks and previews

•	Name	Status	URL
	Spaces	ready!	Spaces preview
	Website	ready!	Website preview
🦄	Changes	detected!	Details

Install Gradio from this PR

pip install https://gradio-pypi-previews.s3.amazonaws.com/6334868b4637fd24f2e2d32aed6b0dd7b69f5ac9/gradio-6.8.0-py3-none-any.whl

Install Gradio Python Client from this PR

pip install "gradio-client @ git+https://github.com/gradio-app/gradio@6334868b4637fd24f2e2d32aed6b0dd7b69f5ac9#subdirectory=client/python"

Install Gradio JS Client from this PR

npm install https://gradio-npm-previews.s3.amazonaws.com/6334868b4637fd24f2e2d32aed6b0dd7b69f5ac9/gradio-client-2.1.0.tgz

gradio-pr-bot · 2026-03-04T14:07:16Z

🦄 change detected

This Pull Request includes changes to the following packages.

Package	Version
`gradio`	`patch`

bypass HTTP loopback for non-queued MCP tool calls, calling blocks.process_api() directly to reduce latency

✅ Changeset approved by @freddyaboulton

Maintainers can remove approval by unchecking this checkbox.

Something isn't right?

Maintainers can change the version label to modify the version bump.
If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

freddyaboulton · 2026-03-04T14:43:39Z

gradio/mcp.py

+                # This eliminates thread dispatches, TCP round-trips, and SSE
+                # overhead — reducing MCP tool-call latency significantly.
+                session_state = SessionState(self.blocks)
+                raw_output = await self.blocks.process_api(


We should pass the request here

Done! Added request=self.mcp_server.request_context.request to the process_api() call in a96be5a.

freddyaboulton · 2026-03-04T14:44:09Z

Thanks @Mandark-droid ! This is great. Just one comment, we should pass the request to call_process_api

@freddyaboulton

Address review feedback from @freddyaboulton: forward the request object to blocks.process_api() so that downstream handlers have access to the original HTTP request context.

abidlabs

Awesome @Mandark-droid! Tested and works great.

I would just add a section in the MCP docs mentioning that for performance, set queue=False. We already have a brief description here: https://www.gradio.app/guides/building-mcp-server-with-gradio#sending-progress-updates, but we can expand it a bit to reference the improved performance optimization

aaazzam · 2026-03-06T16:17:13Z

Work on FastMCP. Nice fix btw — 16x improvement is legit.

FYI the FastMCP numbers in the table don't match what we see on our end (ask Claude to check out the benchmark branch and try to reproduce). Looks like the baseline was misconfigured. The Gradio improvement is the real story here anyway, congrats team!

Mandark-droid and others added 3 commits March 4, 2026 10:40

Add changeset for MCP direct call optimization

24a98cb

Mofiy unit test

4152e77

Fix code

295cfd2

freddyaboulton reviewed Mar 4, 2026

View reviewed changes

freddyaboulton and others added 2 commits March 4, 2026 09:44

Fix test

c91de4c

Pass request to process_api in non-queued MCP fast path

a96be5a

Address review feedback from @freddyaboulton: forward the request object to blocks.process_api() so that downstream handlers have access to the original HTTP request context.

abidlabs approved these changes Mar 4, 2026

View reviewed changes

abidlabs and others added 4 commits March 4, 2026 10:47

Merge branch 'main' into fix/mcp-direct-call-unqueued-events

aab3a56

Merge branch 'main' into fix/mcp-direct-call-unqueued-events

195665f

Add note to guide

e70b69d

Merge branch 'main' into fix/mcp-direct-call-unqueued-events

6334868

freddyaboulton enabled auto-merge (squash) March 6, 2026 16:13

freddyaboulton merged commit 0595d1b into gradio-app:main Mar 6, 2026
21 of 22 checks passed

gradio-pr-bot mentioned this pull request Mar 6, 2026

chore: update versions #12953

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MCP tool-call latency for non-queued events#12961

Fix MCP tool-call latency for non-queued events#12961
freddyaboulton merged 10 commits intogradio-app:mainfrom
Mandark-droid:fix/mcp-direct-call-unqueued-events

Mandark-droid commented Mar 4, 2026 •

edited

Loading

Uh oh!

gradio-pr-bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

gradio-pr-bot commented Mar 4, 2026 •

edited

Loading

Something isn't right?

Uh oh!

freddyaboulton Mar 4, 2026

Uh oh!

Mandark-droid Mar 4, 2026

Uh oh!

freddyaboulton Mar 4, 2026

Uh oh!

freddyaboulton commented Mar 4, 2026

Uh oh!

abidlabs left a comment

Uh oh!

aaazzam commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Mandark-droid commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

AI Disclosure

Benchmark Results

Three-Way Comparison: Gradio MCP Streamable Protocol

Key Results

Benchmark Datasets (reproducible)

Testing and Formatting

Changes

Uh oh!

gradio-pr-bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪼 branch checks and previews

Uh oh!

gradio-pr-bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦄 change detected

This Pull Request includes changes to the following packages.

Something isn't right?

Uh oh!

freddyaboulton Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Mandark-droid Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

freddyaboulton Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

freddyaboulton commented Mar 4, 2026

Uh oh!

abidlabs left a comment

Choose a reason for hiding this comment

Uh oh!

aaazzam commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Mandark-droid commented Mar 4, 2026 •

edited

Loading

gradio-pr-bot commented Mar 4, 2026 •

edited

Loading

gradio-pr-bot commented Mar 4, 2026 •

edited

Loading