-
Notifications
You must be signed in to change notification settings - Fork 295
perf: add schema caching to avoid repeated reflection #685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf: add schema caching to avoid repeated reflection #685
Conversation
|
Thank you! This makes sense in principle, and we hadn't considered the effect that lack of a cache would have on the stateless hosting model. Will review expediently. |
|
@findleyr also don’t feel attacked by the AI generated performance test results report and PR description 😅. I believe the principle of this change is exactly right but happy for you to take over implementation or re-implement if you like the idea too. I have enabled maintainer edits. |
mcp/schema_cache_benchmark_test.go
Outdated
|
|
||
| // BenchmarkAddTool_TypedHandler measures performance of AddTool with typed handlers. | ||
| // This simulates the stateless server pattern where new servers are created per request. | ||
| func BenchmarkAddTool_TypedHandler(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove underscores in benchmark and test names
|
As I detail in my comments, I don't believe the tool call implementation will work as it stands. But you also haven't motivated it, as you have for the schema cache. Maybe it isn't necessary, especially given the latency of current LLMs. It should have been in a second PR anyway, since it's an independent change. |
|
Thanks for the review @jba I'll definitely be happy to split. I agree. Regarding:
We do have sufficient motivation beyond user perception of latency to keep our fixed startup costs down, consider that this adds additional fixed cost latency to every single agent that runs our MCP by default (like say Copilot Coding Agent that runs on Actions runners), which adds up, and it's not per request as all clients need to get all the initial routes at startup and I don't believe all client SDKs make those requests in parallel so the effect is multiplicative to the number of server features offered. I'll redo benchmarking with just the cache changes and see, but I suspect we will leave the shim in place and avoid the problem, gradually moving away from the tool for type method more permanently. Appreciate the quick response. |
This change adds a global schema cache that dramatically reduces the cost of registering tools in stateless server patterns (new server per request). Key improvements: - Cache schemas by reflect.Type for typed handlers - Cache resolved schemas by pointer for pre-defined schemas - 132x faster tool registration after first call - 51x fewer allocations per AddTool call - 32x less memory per AddTool call Benchmarks: - BenchmarkAddToolTypedHandler: 1,223 ns/op vs 161,463 ns/op (no cache) - BenchmarkAddToolTypedHandler: 21 allocs vs 1,072 allocs (no cache) This benefits integrators like github-mcp-server automatically without any code changes required.
46310a7 to
988974d
Compare
|
Thanks for the thorough review @jba! I've addressed all feedback:
The PR is now squashed to a single commit with just the schema caching optimization. Force pushed the cleaned branch. |
Comprehensive Benchmark Report: Schema Caching Performance ImpactExecutive SummaryThis PR's schema caching fix fully resolves the performance regression introduced when migrating from mcp-go to go-sdk, while also providing significant memory improvements.
Test Configurations
1. MCP Protocol Latency Tests (n=30)Operation:
|
| Configuration | P50 | P99 | vs main | Status |
|---|---|---|---|---|
| main | 13.42ms | 16.38ms | baseline | ✅ |
| gosdk | 21.20ms | 25.61ms | +58% | 🔴 REGRESSION |
| gosdk_shimmed | 11.89ms | 14.45ms | -11% | ✅ FIXED |
| gosdk_optimized | 11.73ms | 15.22ms | -13% | 🏆 WINNER |
Operation: tools/list
| Configuration | P50 | P99 | vs main | Status |
|---|---|---|---|---|
| main | 15.44ms | 24.84ms | baseline | ✅ |
| gosdk | 24.74ms | 31.06ms | +60% | 🔴 REGRESSION |
| gosdk_shimmed | 15.13ms | 21.46ms | -2% | ✅ FIXED |
| gosdk_optimized | 15.07ms | 19.40ms | -2% | 🏆 WINNER |
Operation: prompts/list
| Configuration | P50 | P99 | vs main | Status |
|---|---|---|---|---|
| main | 13.01ms | 15.56ms | baseline | ✅ |
| gosdk | 22.33ms | 29.35ms | +72% | 🔴 REGRESSION |
| gosdk_shimmed | 11.72ms | 17.78ms | -10% | ✅ FIXED |
| gosdk_optimized | 11.41ms | 16.30ms | -12% | 🏆 WINNER |
2. Stress Tests (n=100)
Operation: initialize
| Configuration | P50 | P99 | vs main | Status |
|---|---|---|---|---|
| main | 13.12ms | 18.95ms | baseline | ✅ |
| gosdk | 21.96ms | 29.70ms | +67% | 🔴 REGRESSION |
| gosdk_shimmed | 11.63ms | 15.39ms | -11% | ✅ FIXED |
| gosdk_optimized | 11.73ms | 18.05ms | -11% | 🏆 WINNER |
3. Memory Allocation Comparison (pprof)
| Configuration | Total Allocations | vs main | vs broken gosdk |
|---|---|---|---|
| main | 343 MB | baseline | — |
| gosdk (broken) | 1186.82 MB | +246% | baseline |
| gosdk_optimized | 149.43 MB | -56% 🏆 | -87% 🏆 |
Root Cause - Top Allocations in Broken gosdk
| Function | Size | % | Issue |
|---|---|---|---|
| reflect.New | 105.57 MB | 8.9% | 🚨 Schema reflection on every request |
| slices.AppendSeq | 85.50 MB | 7.2% | String collection for schema |
| encoding/json.Unmarshal | 66.01 MB | 5.6% | Schema parsing |
| jsonschema.checkStructure | 24.50 MB | 2.1% | 🚨 Schema validation |
Top Allocations in gosdk_optimized (After Fix)
| Function | Size | % | Notes |
|---|---|---|---|
| bytes.growSlice | 7.82 MB | 5.2% | Normal JSON operations |
| runtime.allocm | 5.51 MB | 3.7% | Goroutine stacks |
| encoding/json.Marshal | 5.10 MB | 3.4% | Normal serialization |
No more reflect.New, jsonschema.checkStructure, or repeated schema generation!
4. Tool Call Tests (n=25, includes GitHub API latency)
⚠️ Times include GitHub API round-trip (~200-750ms)
Tool: get_me
| Configuration | P50 | P99 | Avg | Errors |
|---|---|---|---|---|
| main | 215.33ms 🏆 | 411.90ms | 232.58ms | 0 |
| gosdk | 230.62ms | 414.06ms | 241.64ms | 0 |
| gosdk_shimmed | 226.53ms | 282.34ms | 228.62ms | 0 |
| gosdk_optimized | 219.92ms | 412.27ms | 235.64ms | 0 |
Tool: list_issues
| Configuration | P50 | P99 | Avg | Errors |
|---|---|---|---|---|
| main | 758.72ms | 1114.68ms | 798.82ms | 0 |
| gosdk | 754.56ms | 1093.81ms | 758.48ms | 0 |
| gosdk_shimmed | 710.04ms 🏆 | 1725.47ms | 781.53ms | 0 |
| gosdk_optimized | 745.10ms | 928.19ms 🏆 | 744.38ms 🏆 | 0 |
Tool call performance is equivalent across all configurations - variance is primarily from GitHub API response times.
5. Visual Summary
Latency (P50 initialize) - Lower is Better
================================================================================
main (mcp-go) █████████████████████████████████ 13.42ms (baseline)
gosdk (broken) ██████████████████████████████████████████████████████ 21.20ms (+58%)
gosdk_shimmed █████████████████████████████ 11.89ms (-11%) ✅
gosdk_optimized █████████████████████████████ 11.73ms (-13%) 🏆
================================================================================
Memory Allocations - Lower is Better
================================================================================
main (mcp-go) ██████████████████ 343 MB
gosdk (broken) ██████████████████████████████████████████████████████████████ 1187 MB (+246%)
gosdk_optimized █████████ 149 MB (-56% vs main, -87% vs broken) 🏆
================================================================================
Winners Summary
| Category | Winner | Value |
|---|---|---|
| initialize latency | gosdk_optimized | 11.73ms |
| tools/list latency | gosdk_optimized | 15.07ms |
| prompts/list latency | gosdk_optimized | 11.41ms |
| Memory efficiency | gosdk_optimized | 149.43 MB |
| get_me tool call | main | 215.33ms |
| list_issues P99 | gosdk_optimized | 928.19ms |
Conclusion
This PR's schema caching approach fully resolves the performance regression and provides:
- ✅ 13% faster latency than mcp-go baseline for MCP protocol operations
- ✅ 87% less memory than broken go-sdk (1187 MB → 149 MB)
- ✅ 56% less memory than mcp-go baseline (343 MB → 149 MB)
- ✅ Equivalent tool call performance (within 2-5%)
- ✅ Lowest P99 variance for complex operations
The fix eliminates repeated JSON schema generation via reflection by caching schemas after first generation.
Strongly recommend merging this fix. 🚀
|
Awesome analysis, thank you. |
| // If an indirection occurred to derive the schema, a non-nil zero value is | ||
| // returned to be used in place of the typed nil zero value. | ||
| // | ||
| // Note that if sfield already holds a schema, zero will be nil even if T is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very subtle logic is changed in this CL, so at the very least we need to update or delete this comment.
I think we were overly cautious when changing the behavior with respect to pointers, and only changed our handling of null if we were responsible for deriving the schemal.
But looking at it now, I actually think this change is correct: the only point of NOT returning a zero if the user supplied their own schema would be to allow returning JSON "null", but that's not valid for structured content according to the MCP spec. Therefore, I think we SHOULD change this, but not without a bug report and a test demonstrating the incorrect behavior. I will do that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as you noticed, this is #692)
mcp/server.go
Outdated
| // Case 2: Schema was provided | ||
| // Check if it's a *jsonschema.Schema we can cache by pointer | ||
| if providedSchema, ok := (*sfield).(*jsonschema.Schema); ok { | ||
| if resolved, ok := globalSchemaCache.getBySchema(providedSchema); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just point out that this is a rather subtle change in behavior: previously, if the user re-used a schema, we'd always validate against the version of the schema that was provided when they called AddTool. Now, we may get a cache hit and use a stale resolved schema.
I know this is unlikely, but given its subtlety this is a rather scary change in behavior.
I wonder if we should therefore put this optimization behind a ServerOption. For example, imagine a ServerOptions.SchemaCache field that accepted a schemacache.Cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's certainly not a problem to put it behind a server option from my perspective. I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you make that change, push it as a separate commit so we can look at it in isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I have done so: f476e90
It only changes the public API by adding a server option. It does have to add a bunch of nil checks, I couldn't find a better way to reduce that, but I think it's still pretty clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ServerOptions.SchemaCache to enable optional caching of JSON schemas
for tools. When set, the cache avoids repeated reflection-based schema
generation and resolution, which significantly improves performance for
stateless server deployments where tools are re-registered on every request.
This change addresses reviewer feedback to make the caching optimization
opt-in, avoiding subtle behavior changes for users who may re-use and
modify schemas between requests.
Usage:
cache := mcp.NewSchemaCache()
server := mcp.NewServer(impl, &mcp.ServerOptions{
SchemaCache: cache,
})
mcp.AddTool(server, tool, handler)
Summary
This PR adds schema caching that dramatically reduces the cost of tool registration in stateless server patterns.
Problem
In stateless deployments like
github/github-mcp-server, a new*mcp.Serveris created for each incoming request. This meansAddToolis called repeatedly for the same tools, causing:jsonschema.ForType()reflection is called every timeschema.Resolve()is called every timeSolution
Add a
schemaCachethat stores:reflect.Type(for auto-generated schemas from typed handlers)The cache is:
sync.MapBenchmark Results
Files Changed
mcp/schema_cache.go- New cache implementationmcp/server.go- ModifiedsetSchemato use cachemcp/schema_cache_test.go- Unit tests for caching behaviormcp/schema_cache_benchmark_test.go- BenchmarksImpact for Integrators
Automatic - no code changes required. Integrators using:
AddTool[In, Out]) → cache by typeTool{InputSchema: schema}) → cache by pointerBoth patterns benefit from caching after the first call.
Real-World Performance Validation
The following benchmarks were conducted using
github/github-mcp-server(a production MCP server with ~130 tools) to validate the performance impact in a real-world scenario.Test Environment
Configurations Tested
Latency Test Results (n=30)
Operation:
initializeOperation:
tools/listOperation:
prompts/listStress Test Results (n=100)
Memory/Allocation Comparison (from pprof)
Top Allocation Sources - go-sdk WITHOUT cache (broken)
jsonschema.UnmarshalJSONencoding/json.Unmarshaljsonschema.resolvejsonschema.MarshalJSONRoot Cause Analysis
The
google/jsonschema-golibrary regenerates JSON schemas on every request instead of caching them. In a server with ~130 tools, this causes:Key Findings