feat: update/add detailed telemetry for stdio and http mcp transports by AjmeraParth132 · Pull Request #1987 · googleapis/genai-toolbox

AjmeraParth132 · 2025-11-19T06:22:27Z

Description

This PR adds consistent and actionable telemetry for MCP sessions across HTTP and STDIO transports, enabling quick visibility into toolset discovery and tool invocation activity with minimal setup.

PR Checklist

Make sure you reviewed CONTRIBUTING.md
Make sure to open an issue as a bug/issue
before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)
Make sure to add ! if this involve a breaking change

gemini-code-assist · 2025-11-19T06:22:47Z

Summary of Changes

Hello @AjmeraParth132, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the observability of the Managed Code Platform (MCP) server by implementing comprehensive and detailed telemetry. The changes introduce dynamic trace naming, allowing for more specific tracking of operations like toolset discovery and tool invocation. Furthermore, telemetry spans now capture the transport mechanism used and the name of the invoked tool, alongside improved error reporting. This enhancement aims to provide developers and operators with actionable insights and quicker visibility into the system's behavior and potential issues.

Highlights

Enhanced Telemetry: Detailed telemetry has been added for MCP sessions, providing better visibility into toolset discovery and tool invocation activities.
Dynamic Trace Naming: Telemetry spans are now dynamically named based on the specific JSON-RPC method being processed, offering more granular insights.
Transport and Tool Identification: New attributes for 'transport' (e.g., 'stdio', 'mcp-http', 'native') and 'tool_name' are included in telemetry spans to identify the request origin and the specific tool invoked.
Improved Error Reporting: Errors encountered during MCP message processing are now explicitly set on the telemetry spans, aiding in quicker debugging and issue identification.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds detailed telemetry for MCP sessions over HTTP and stdio transports. The changes are well-implemented, propagating toolName information through the call stack to enrich trace spans. This will provide better visibility into tool discovery and invocation. The changes are consistent across different MCP protocol versions. I have one suggestion to refactor duplicated code to improve maintainability.

internal/server/mcp.go

Yuan325 · 2025-12-17T22:11:13Z

/gemini review

gemini-code-assist

Code Review

This pull request enhances telemetry for MCP sessions across both HTTP and STDIO transports. It introduces a transport attribute to OpenTelemetry spans and provides more specific trace names for tools/list and tools/call methods. Additionally, for tool invocations, the tool_name is now included as a span attribute, which is a great improvement for observability. The implementation involves refactoring several functions to propagate the toolName up the call stack, and these changes have been applied consistently across different MCP protocol versions.

My main feedback is regarding a piece of duplicated logic for determining the trace name, which I've commented on. Addressing this would improve the code's maintainability.

internal/server/mcp.go

Yuan325 · 2025-12-18T07:05:16Z

Hi @AjmeraParth132 , thank you for your contribution! I'll make sure to review this tomorrow. :)

In the meantime, please reopen this issue #1633. There's also some outstanding comments regarding the naming of the traces. Ideally we would want to come into an agreement on the names before making any official changes since updating trace name is considered breaking change~

AjmeraParth132 · 2025-12-18T20:48:39Z

Hi @Yuan325, thanks for the update. Sure, we'll just internally discuss the latest scope of work and reopen the updated issue quickly. Will inform you here once done.

Yuan325

Please also rebase this PR :) Thank you again for your contribution!

internal/server/api.go

internal/server/mcp.go

AjmeraParth132 · 2025-12-22T06:11:12Z

@Yuan325 thanks for your review suggestions.

On the transport/endpoint convention, we started with connection-type, but Kurtis suggested transport in the review doc. Happy to rename it as endpoint if that feels more intuitive.
Setting traceName, toolName and toolsetName attributes to processMcpMessage() seems a clear way - just to confirm, we might need to pass the span context for trace telemetry through the function's arguments further down, I hope that's fine?
Meanwhile, I have opened a new issue - #2222 for tracking with latest set of actionables as discussed.

Will rebase the PR and update here :)

Yuan325 · 2025-12-27T07:02:43Z

Hi @AjmeraParth132

Some questions: (please advise on best practice within the field if there's any) --

Second thought --do we need the transport/endpoint attribute? since it can be determine through the trace name itself right?
Would like to clarify what you mean by span context? We can definitely add ctx context.Context to the functions if that's what you meant.

Please add the full visualization of the traces/attributes/metrics that you will be adding in that new issue -- if you're planning to just continue to work from there? (similar to #1633 (comment)) It will be easier for future contributors to reference, and also a way to keep track of the changes that will be applied :) Thank you very much!! I'll try to keep a closer eye on your issue/PR so that we can progress faster. Apologies for the slow response

AjmeraParth132 · 2025-12-30T05:29:48Z

Hi @Yuan325 ,

Apologies for the delayed response, we’ve been AFK this week due to some family commitments and New Year festivities. We’ll be back to work after this week and will pick this up with the highest priority to help move things faster. Responding to your points:

The plan is to unify trace names across connection types and represent transports as attributes. Could you please share your email ID? I’ll send over the document we previously shared with Kurtis that outlines the proposed changes.
Yes!

We’ll also add a complete visualization of the new traces, attributes, and metrics here shortly, so we can align on the overall approach before proceeding further with development. Thanks.

Yuan325 · 2025-12-30T21:42:12Z

@AjmeraParth132 Sounds good, sent an email to you :)

Yuan325 · 2026-01-06T21:40:02Z

@AjmeraParth132
The design LGTM, just a little comment: for transport attribute, can we separate it to mcp-shttp (streamable http), mcp-stdio, mcp-httpsse (for sse).

Please lmk once this is rebased and updated (include moving span to processMCPMessage()). Thank you again!! :)

AjmeraParth132 · 2026-01-07T10:31:48Z

@Yuan325 perfect, updating the same. though, one more clarification: moving/initializing trace in processMCPMessage() or processMethod() will eventually narrow down the span scope - ie, span time and capturing error will now be limited to the new scope. what we can do here is:

either we start with a temporary trace name, and update it later down in processMethod() or processMCPMessage()
or we decode the request body in httpHandler start itself to find out the calling method
or work with limited span scope starting from the lower down functions

which one do you prefer?

Yuan325 · 2026-01-07T22:12:57Z

@AjmeraParth132 What do you think about this (kind of combination between 1 and 3):

Keep the existing toolbox/server/mcp span that starts in httpHandler(). There's also a span -- toolbox/server/mcp/sse in sseHandler(). If we want to make it consistent, we can add a new span in stdio toolbox/server/mcp/stdio.
Add the general toolbox/server/toolset/get and toolbox/server/tool/invoke in processMCPMessage(). Since we will have the method decoded there (and error handling when decode is not successful), we can prevent decoding it twice. Or if there's any changes on decoding in the future, we won't have to update it at all the different places.

AjmeraParth132 · 2026-01-08T17:03:01Z

@Yuan325 That makes sense. But rather than spamming with 2 traces for the same case, what I feel a better approach is -

To start with toolbox/server/mcp/sse, toolbox/server/mcp/stdio and toolbox/server/mcp/shttp respectively, and rename the trace later in processMCPMessage(). In that case, we will have full span duration and error attribute reporting with the main trace itself, and if everything runs right, will be renamed to the unified naming anyways.

You can check the PR - it's updated and rebased as well. Lmk if we prefer any other way though.

Yuan325 · 2026-01-08T20:59:55Z

@AjmeraParth132 Isn't it generally bad practice to rename telemetry traces since it will generate inconsistent traces, and potentially make it harder to track?

AjmeraParth132 · 2026-01-10T07:16:16Z

Agreed @Yuan325 , but the downsides felt more concerning!

For starting tool/invoke and toolset/get in processMCPMessage() -

We’ll loose the errors captured in handlers for mcp calls in the unified traces
For someone observing the telemetry, the native ones will have timespan from handler start, while the mcp ones will have ‘em from processMCPMessage(), making latency an unreliable attribute.
And obviously, unnecessary 2 trace spamming for the collector

I feel, if we can let decoding the method in handler itself, that could be a solution OR the renaming way itself as the likelihood of inconsistent traces is almost null (as the traces will always be renamed unless we catch an error in handler parsing prior to processMCPMessage() call which is very unlikely, and that too will be captured under /mcp/<transport> trace anyway).

Would love to hear your thoughts!

Yuan325 · 2026-01-13T04:24:05Z

@AjmeraParth132 Got it, thanks for the insight. Let me discuss this with the team before finalizing this. Will keep this as priority on my end and circle back to you asap :)

Yuan325 · 2026-01-13T16:55:12Z

@AjmeraParth132 I discovered that OpenTelemetry actually has established semantic conventions for MCP (here). I wonder if you/your team had a chance to review those yet?

I also wanted to flag that we are planning to deprecate the native endpoint transport in the future. Given that, it may not be worth the effort to consolidate spans from both endpoints right now.

If you’d prefer to sync up to resolve this quickly, feel free to send over a meeting invite.

AjmeraParth132 · 2026-01-19T14:20:36Z

Hey @Yuan325! Sorry for the delay. Saw this today - that just didn’t pop in my email.

Yes we have been following the OTel convention developments closely, and I think that’s a great way to instrument for end to end telemetry use cases, across multiple services. Given that, while the full fruit of reorganizing our telemetry will come up after we instrument the complete genAI stack, it is still always the best bet to align with the standard conventions.

Also, deprecating the native endpoint makes it easier. We can now get the MCP spans in processMCPMessage directly where we have it decoded anyway. Note that it implies missing error captures in the handlers, which I guess is fine (or we can keep the existing custom toolbox/server/mcp as well for that purpose).

On telemetry, let us reorganize tools/list and tools/call spans and their attributes as per the convention, and continue with the custom toolset/get as mcp.method.name, given there’s none defined for that purpose in [server convention] method name list.

Do let me know if you have any thoughts or need any changes in here. Will update the PR with this plan if it looks good? And once done, we can then align metrics as well with the conventions?

Yuan325 · 2026-01-21T19:49:35Z

@AjmeraParth132 Awesome, I do agree that deprecating the native endpoint will make the traces much simpler (that way we don't have to keep consistency between the 2 endpoints).

Keep the existing toolbox/server/mcp/sse for the endpoint that it's used to establish SSE connection.
Rename the span toolbox/server/mcp to toolbox/server/mcp/http and add a new span toolbox/server/stdio for stdio. This way we can track if there's any error happening during the handlers? WDYT about this?
Implement the sem conv for spans within processMcpMessage(). Seems like we'll also have to retrieve properties inside params._meta property bag for parent trace's informations.
For the telemetry name, agreed to use the names per convention -- {mcp.method.name} {target} (e.g. tools/call get-weather). Let's also try to add the recommended attributes -- network.transport, network.protocol.*, mcp.protocol.version,... especially since it will be used to differentiate transport protocol etc.
We currently don't have a toolset/get method within our mcp endpoint, it uses tools/list under the hood, so the toolset/get will not be necessary? WDYT about keeping with tools/list, and we can add our custom attributes toolset.name? (or do you think it makes more sense to add the toolset name as part of the {target} attribute. Example:
a. tools/list (attribute toolset.name=toolset_a) and tools/call get-weather (attribute toolset.name=toolset_a)
or
b. tools/list toolset_a and tools/call toolset_a/get-weather for invoking a specific toolset; tools/list and tools/call get-weather for default toolset (default toolset name is "")

Example or spans:
STDIO initialize:

initialize (CLIENT, trace=t1, span=s1) # FROM MCP Client
|
--- toolbox/server/mcp/stdio (SERVER, trace=t1, span=s2, parent=s1) # IN TOOLBOX
     |
     --- initialize (SERVER, trace=t1, span=s3, parent=s2) # IN TOOLBOX

STDIO tool call:

tools/call get-weather (CLIENT, trace=t1, span=s1) # FROM MCP Client
|
--- toolbox/server/mcp/stdio (SERVER, trace=t1, span=s2, parent=s1) # IN TOOLBOX
     |
     --- tools/call get-weather (SERVER, trace=t1, span=s3, parent=s2) # IN TOOLBOX

SSE connection:

connection (CLIENT, trace=t1, span=s2, parent=s1) # FROM MCP Client
|
--- toolbox/server/mcp/sse (SERVER, trace=t1, span=s3, parent=s2) # IN TOOLBOX

HTTP initialize:

initialize (CLIENT, trace=t1, span=s1) # FROM MCP Client
|
--- toolbox/server/mcp/http (SERVER, trace=t1, span=s2, parent=s1) # IN TOOLBOX
     |
     --- initialize (SERVER, trace=t1, span=s3, parent=s2) # IN TOOLBOX

HTTP tool call:

tools/call get-weather (CLIENT, trace=t1, span=s1) # FROM MCP Client
|
--- toolbox/server/mcp/http (SERVER, trace=t1, span=s2, parent=s1) # IN TOOLBOX
     |
     --- tools/call get-weather (SERVER, trace=t1, span=s3, parent=s2) # IN TOOLBOX

If this looks good to you, feel free to update this PR, thank you so much once again!
And yes, once this is done, we can then align the metrics with the conventions :)

AjmeraParth132 · 2026-01-23T10:54:01Z

That's great @Yuan325!

1,2,4 - perfectly aligned
3 - yes, will check with few mcp clients as well if they are consistently passing parent trace information in the meta object and will add them in our server telemetry
5 - correct. I think it is better to put that in attribute(a) as there's no written convention around toolsets, but either way is fine.

Yes, the flows are exactly what they should be! Will start working on the PR 🚀

Yuan325 · 2026-01-23T19:06:42Z

@AjmeraParth132 awesome, thank you! Will take a look when Let's add the toolset name in attribute then 👍

I'll make sure to prioritize for review once this is updated! Thank you so much again :)

AjmeraParth132 · 2026-02-05T20:41:52Z

Hey @Yuan325, updated the PR with the required changes, lmk your thoughts.

Yuan325 · 2026-02-07T03:44:07Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces detailed telemetry for MCP sessions over HTTP and STDIO transports, aiming to improve observability by propagating W3C trace context and enriching spans. However, several security vulnerabilities have been identified, including a potential Denial of Service due to unrestricted request body reading, an SSE injection vulnerability in the SSE handler, and telemetry injection via untrusted input in span names. Additionally, there are performance concerns and code duplication due to the request body being unmarshalled multiple times across different functions. Addressing these security and performance issues will enhance the robustness and efficiency of the telemetry implementation.

internal/server/mcp.go

Yuan325

Thank you for updating this!! Left some feedback for code updates :)

internal/server/mcp.go

AjmeraParth132 · 2026-02-09T06:18:08Z

Thanks @Yuan325 for the detailed feedback, rightly agreed. Incorporated the above requested changes to the PR.

Yuan325

yay, last 2 comments! Overall LGTM! I'll merge this after the 2 comments is resolved. Thank you once again!! :)

Yuan325 · 2026-02-11T00:48:00Z

internal/server/mcp.go

Suggested change

if err = s.write(msgCtx, res); err != nil {

Yuan325 · 2026-02-11T00:58:45Z

internal/server/mcp.go

 	// Check if message is a notification
 	if baseMessage.Id == nil {
 		err := mcp.NotificationHandler(ctx, body)
 		return "", nil, err
 	}


Might want to move this to after setting the span attributes since span are added before notification handling? Also will appreciate if you can setStatus as error if there's error.

if err != nil { span.SetStatus(codes.Error, err.Error()) } return "", nil, err

Yuan325 · 2026-02-11T00:59:59Z

/gcbrun

AjmeraParth132 · 2026-02-12T06:18:05Z

Perfect! Pushed the changes.

AjmeraParth132 requested a review from a team as a code owner November 19, 2025 06:22

blunderbuss-gcf bot assigned duwenxin99 Nov 19, 2025

gemini-code-assist bot reviewed Nov 19, 2025

View reviewed changes

internal/server/mcp.go Outdated Show resolved Hide resolved

duwenxin99 assigned Yuan325 and unassigned duwenxin99 Dec 17, 2025

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

internal/server/mcp.go Outdated Show resolved Hide resolved

Yuan325 requested changes Dec 20, 2025

View reviewed changes

internal/server/api.go Outdated Show resolved Hide resolved

internal/server/mcp.go Outdated Show resolved Hide resolved

AjmeraParth132 force-pushed the feat-telemetry-updates branch from 064b565 to f8f8004 Compare January 8, 2026 17:02

Yuan325 added priority: p2 Moderately-important priority. Fix may not be included in next release. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Feb 3, 2026

Yuan325 removed the priority: p2 Moderately-important priority. Fix may not be included in next release. label Feb 3, 2026

AjmeraParth132 added 2 commits February 5, 2026 09:56

feat: renaming spans and attributes

c811e69

feat: parent tracing

686b315

AjmeraParth132 force-pushed the feat-telemetry-updates branch from f8f8004 to 686b315 Compare February 5, 2026 20:38

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

internal/server/mcp.go Show resolved Hide resolved

internal/server/mcp.go Outdated Show resolved Hide resolved

internal/server/mcp.go Outdated Show resolved Hide resolved

internal/server/mcp.go Show resolved Hide resolved

Yuan325 requested changes Feb 7, 2026

View reviewed changes

AjmeraParth132 added 5 commits February 8, 2026 22:01

feat: telemetry semantics update

6ba87c2

feat: renaming spans and attributes

8f85cc3

feat: parent tracing

d26c304

fix: merge conflicts

a6af0d8

fix: merge conflicts

610565f

Yuan325 approved these changes Feb 11, 2026

View reviewed changes

feat: telemetry updates

e201d6f

Conversation

AjmeraParth132 commented Nov 19, 2025

Description

PR Checklist

Uh oh!

gemini-code-assist bot commented Nov 19, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Yuan325 commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Yuan325 commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AjmeraParth132 commented Dec 18, 2025

Uh oh!

Yuan325 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AjmeraParth132 commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yuan325 commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AjmeraParth132 commented Dec 30, 2025

Uh oh!

Yuan325 commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yuan325 commented Jan 6, 2026

Uh oh!

AjmeraParth132 commented Jan 7, 2026

Uh oh!

Yuan325 commented Jan 7, 2026

Uh oh!

AjmeraParth132 commented Jan 8, 2026

Uh oh!

Yuan325 commented Jan 8, 2026

Uh oh!

AjmeraParth132 commented Jan 10, 2026

Uh oh!

Yuan325 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yuan325 commented Jan 13, 2026

Uh oh!

AjmeraParth132 commented Jan 19, 2026

Uh oh!

Yuan325 commented Jan 21, 2026

Uh oh!

AjmeraParth132 commented Jan 23, 2026

Uh oh!

Yuan325 commented Jan 23, 2026

Uh oh!

AjmeraParth132 commented Feb 5, 2026

Uh oh!

Yuan325 commented Feb 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yuan325 commented Dec 18, 2025 •

edited

Loading

AjmeraParth132 commented Dec 22, 2025 •

edited

Loading

Yuan325 commented Dec 27, 2025 •

edited

Loading

Yuan325 commented Dec 30, 2025 •

edited

Loading

Yuan325 commented Jan 13, 2026 •

edited

Loading