Skip to content

feat: update/add detailed telemetry for stdio and http mcp transports#1987

Open
AjmeraParth132 wants to merge 8 commits intogoogleapis:mainfrom
AjmeraParth132:feat-telemetry-updates
Open

feat: update/add detailed telemetry for stdio and http mcp transports#1987
AjmeraParth132 wants to merge 8 commits intogoogleapis:mainfrom
AjmeraParth132:feat-telemetry-updates

Conversation

@AjmeraParth132
Copy link

Description

This PR adds consistent and actionable telemetry for MCP sessions across HTTP and STDIO transports, enabling quick visibility into toolset discovery and tool invocation activity with minimal setup.

PR Checklist

  • Make sure you reviewed CONTRIBUTING.md
  • Make sure to open an issue as a bug/issue
    before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
  • Make sure to add ! if this involve a breaking change

@AjmeraParth132 AjmeraParth132 requested a review from a team as a code owner November 19, 2025 06:22
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @AjmeraParth132, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the observability of the Managed Code Platform (MCP) server by implementing comprehensive and detailed telemetry. The changes introduce dynamic trace naming, allowing for more specific tracking of operations like toolset discovery and tool invocation. Furthermore, telemetry spans now capture the transport mechanism used and the name of the invoked tool, alongside improved error reporting. This enhancement aims to provide developers and operators with actionable insights and quicker visibility into the system's behavior and potential issues.

Highlights

  • Enhanced Telemetry: Detailed telemetry has been added for MCP sessions, providing better visibility into toolset discovery and tool invocation activities.
  • Dynamic Trace Naming: Telemetry spans are now dynamically named based on the specific JSON-RPC method being processed, offering more granular insights.
  • Transport and Tool Identification: New attributes for 'transport' (e.g., 'stdio', 'mcp-http', 'native') and 'tool_name' are included in telemetry spans to identify the request origin and the specific tool invoked.
  • Improved Error Reporting: Errors encountered during MCP message processing are now explicitly set on the telemetry spans, aiding in quicker debugging and issue identification.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds detailed telemetry for MCP sessions over HTTP and stdio transports. The changes are well-implemented, propagating toolName information through the call stack to enrich trace spans. This will provide better visibility into tool discovery and invocation. The changes are consistent across different MCP protocol versions. I have one suggestion to refactor duplicated code to improve maintainability.

@duwenxin99 duwenxin99 assigned Yuan325 and unassigned duwenxin99 Dec 17, 2025
@Yuan325
Copy link
Contributor

Yuan325 commented Dec 17, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances telemetry for MCP sessions across both HTTP and STDIO transports. It introduces a transport attribute to OpenTelemetry spans and provides more specific trace names for tools/list and tools/call methods. Additionally, for tool invocations, the tool_name is now included as a span attribute, which is a great improvement for observability. The implementation involves refactoring several functions to propagate the toolName up the call stack, and these changes have been applied consistently across different MCP protocol versions.

My main feedback is regarding a piece of duplicated logic for determining the trace name, which I've commented on. Addressing this would improve the code's maintainability.

@Yuan325
Copy link
Contributor

Yuan325 commented Dec 18, 2025

Hi @AjmeraParth132 , thank you for your contribution! I'll make sure to review this tomorrow. :)

In the meantime, please reopen this issue #1633. There's also some outstanding comments regarding the naming of the traces. Ideally we would want to come into an agreement on the names before making any official changes since updating trace name is considered breaking change~

@AjmeraParth132
Copy link
Author

Hi @Yuan325, thanks for the update. Sure, we'll just internally discuss the latest scope of work and reopen the updated issue quickly. Will inform you here once done.

Copy link
Contributor

@Yuan325 Yuan325 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also rebase this PR :) Thank you again for your contribution!

@AjmeraParth132
Copy link
Author

AjmeraParth132 commented Dec 22, 2025

@Yuan325 thanks for your review suggestions.

  • On the transport/endpoint convention, we started with connection-type, but Kurtis suggested transport in the review doc. Happy to rename it as endpoint if that feels more intuitive.
  • Setting traceName, toolName and toolsetName attributes to processMcpMessage() seems a clear way - just to confirm, we might need to pass the span context for trace telemetry through the function's arguments further down, I hope that's fine?
  • Meanwhile, I have opened a new issue - #2222 for tracking with latest set of actionables as discussed.

Will rebase the PR and update here :)

@Yuan325
Copy link
Contributor

Yuan325 commented Dec 27, 2025

Hi @AjmeraParth132

Some questions: (please advise on best practice within the field if there's any) --

  1. Second thought --do we need the transport/endpoint attribute? since it can be determine through the trace name itself right?
  2. Would like to clarify what you mean by span context? We can definitely add ctx context.Context to the functions if that's what you meant.

Please add the full visualization of the traces/attributes/metrics that you will be adding in that new issue -- if you're planning to just continue to work from there? (similar to #1633 (comment)) It will be easier for future contributors to reference, and also a way to keep track of the changes that will be applied :) Thank you very much!! I'll try to keep a closer eye on your issue/PR so that we can progress faster. Apologies for the slow response

@AjmeraParth132
Copy link
Author

Hi @Yuan325 ,

Apologies for the delayed response, we’ve been AFK this week due to some family commitments and New Year festivities. We’ll be back to work after this week and will pick this up with the highest priority to help move things faster. Responding to your points:

  1. The plan is to unify trace names across connection types and represent transports as attributes. Could you please share your email ID? I’ll send over the document we previously shared with Kurtis that outlines the proposed changes.
  2. Yes!

We’ll also add a complete visualization of the new traces, attributes, and metrics here shortly, so we can align on the overall approach before proceeding further with development. Thanks.

@Yuan325
Copy link
Contributor

Yuan325 commented Dec 30, 2025

@AjmeraParth132 Sounds good, sent an email to you :)

@Yuan325
Copy link
Contributor

Yuan325 commented Jan 6, 2026

@AjmeraParth132
The design LGTM, just a little comment: for transport attribute, can we separate it to mcp-shttp (streamable http), mcp-stdio, mcp-httpsse (for sse).

Please lmk once this is rebased and updated (include moving span to processMCPMessage()). Thank you again!! :)

@AjmeraParth132
Copy link
Author

@Yuan325 perfect, updating the same. though, one more clarification: moving/initializing trace in processMCPMessage() or processMethod() will eventually narrow down the span scope - ie, span time and capturing error will now be limited to the new scope. what we can do here is:

  1. either we start with a temporary trace name, and update it later down in processMethod() or processMCPMessage()
  2. or we decode the request body in httpHandler start itself to find out the calling method
  3. or work with limited span scope starting from the lower down functions

which one do you prefer?

@Yuan325
Copy link
Contributor

Yuan325 commented Jan 7, 2026

@AjmeraParth132 What do you think about this (kind of combination between 1 and 3):

  1. Keep the existing toolbox/server/mcp span that starts in httpHandler(). There's also a span -- toolbox/server/mcp/sse in sseHandler(). If we want to make it consistent, we can add a new span in stdio toolbox/server/mcp/stdio.
  2. Add the general toolbox/server/toolset/get and toolbox/server/tool/invoke in processMCPMessage(). Since we will have the method decoded there (and error handling when decode is not successful), we can prevent decoding it twice. Or if there's any changes on decoding in the future, we won't have to update it at all the different places.

@AjmeraParth132 AjmeraParth132 force-pushed the feat-telemetry-updates branch from 064b565 to f8f8004 Compare January 8, 2026 17:02
@AjmeraParth132
Copy link
Author

@Yuan325 That makes sense. But rather than spamming with 2 traces for the same case, what I feel a better approach is -

To start with toolbox/server/mcp/sse, toolbox/server/mcp/stdio and toolbox/server/mcp/shttp respectively, and rename the trace later in processMCPMessage(). In that case, we will have full span duration and error attribute reporting with the main trace itself, and if everything runs right, will be renamed to the unified naming anyways.

You can check the PR - it's updated and rebased as well. Lmk if we prefer any other way though.

@Yuan325
Copy link
Contributor

Yuan325 commented Jan 8, 2026

@AjmeraParth132 Isn't it generally bad practice to rename telemetry traces since it will generate inconsistent traces, and potentially make it harder to track?

@AjmeraParth132
Copy link
Author

Agreed @Yuan325 , but the downsides felt more concerning!

For starting tool/invoke and toolset/get in processMCPMessage() -

  1. We’ll loose the errors captured in handlers for mcp calls in the unified traces
  2. For someone observing the telemetry, the native ones will have timespan from handler start, while the mcp ones will have ‘em from processMCPMessage(), making latency an unreliable attribute.
  3. And obviously, unnecessary 2 trace spamming for the collector


I feel, if we can let decoding the method in handler itself, that could be a solution OR the renaming way itself as the likelihood of inconsistent traces is almost null (as the traces will always be renamed unless we catch an error in handler parsing prior to processMCPMessage() call which is very unlikely, and that too will be captured under /mcp/<transport> trace anyway).

Would love to hear your thoughts!

@Yuan325
Copy link
Contributor

Yuan325 commented Jan 13, 2026

@AjmeraParth132 Got it, thanks for the insight. Let me discuss this with the team before finalizing this. Will keep this as priority on my end and circle back to you asap :)

@Yuan325
Copy link
Contributor

Yuan325 commented Jan 13, 2026

@AjmeraParth132 I discovered that OpenTelemetry actually has established semantic conventions for MCP (here). I wonder if you/your team had a chance to review those yet?

I also wanted to flag that we are planning to deprecate the native endpoint transport in the future. Given that, it may not be worth the effort to consolidate spans from both endpoints right now.

If you’d prefer to sync up to resolve this quickly, feel free to send over a meeting invite.

@AjmeraParth132
Copy link
Author

Hey @Yuan325! Sorry for the delay. Saw this today - that just didn’t pop in my email.

Yes we have been following the OTel convention developments closely, and I think that’s a great way to instrument for end to end telemetry use cases, across multiple services. Given that, while the full fruit of reorganizing our telemetry will come up after we instrument the complete genAI stack, it is still always the best bet to align with the standard conventions.

Also, deprecating the native endpoint makes it easier. We can now get the MCP spans in processMCPMessage directly where we have it decoded anyway. Note that it implies missing error captures in the handlers, which I guess is fine (or we can keep the existing custom toolbox/server/mcp as well for that purpose).

On telemetry, let us reorganize tools/list and tools/call spans and their attributes as per the convention, and continue with the custom toolset/get as mcp.method.name, given there’s none defined for that purpose in [server convention] method name list.

Do let me know if you have any thoughts or need any changes in here. Will update the PR with this plan if it looks good? And once done, we can then align metrics as well with the conventions?

@Yuan325
Copy link
Contributor

Yuan325 commented Jan 21, 2026

@AjmeraParth132 Awesome, I do agree that deprecating the native endpoint will make the traces much simpler (that way we don't have to keep consistency between the 2 endpoints).

  1. Keep the existing toolbox/server/mcp/sse for the endpoint that it's used to establish SSE connection.
  2. Rename the span toolbox/server/mcp to toolbox/server/mcp/http and add a new span toolbox/server/stdio for stdio. This way we can track if there's any error happening during the handlers? WDYT about this?
  3. Implement the sem conv for spans within processMcpMessage(). Seems like we'll also have to retrieve properties inside params._meta property bag for parent trace's informations.
  4. For the telemetry name, agreed to use the names per convention -- {mcp.method.name} {target} (e.g. tools/call get-weather). Let's also try to add the recommended attributes -- network.transport, network.protocol.*, mcp.protocol.version,... especially since it will be used to differentiate transport protocol etc.
  5. We currently don't have a toolset/get method within our mcp endpoint, it uses tools/list under the hood, so the toolset/get will not be necessary? WDYT about keeping with tools/list, and we can add our custom attributes toolset.name? (or do you think it makes more sense to add the toolset name as part of the {target} attribute. Example:
    a. tools/list (attribute toolset.name=toolset_a) and tools/call get-weather (attribute toolset.name=toolset_a)
    or
    b. tools/list toolset_a and tools/call toolset_a/get-weather for invoking a specific toolset; tools/list and tools/call get-weather for default toolset (default toolset name is "")

Example or spans:
STDIO initialize:

initialize (CLIENT, trace=t1, span=s1) # FROM MCP Client
|
--- toolbox/server/mcp/stdio (SERVER, trace=t1, span=s2, parent=s1) # IN TOOLBOX
     |
     --- initialize (SERVER, trace=t1, span=s3, parent=s2) # IN TOOLBOX

STDIO tool call:

tools/call get-weather (CLIENT, trace=t1, span=s1) # FROM MCP Client
|
--- toolbox/server/mcp/stdio (SERVER, trace=t1, span=s2, parent=s1) # IN TOOLBOX
     |
     --- tools/call get-weather (SERVER, trace=t1, span=s3, parent=s2) # IN TOOLBOX

SSE connection:

connection (CLIENT, trace=t1, span=s2, parent=s1) # FROM MCP Client
|
--- toolbox/server/mcp/sse (SERVER, trace=t1, span=s3, parent=s2) # IN TOOLBOX

HTTP initialize:

initialize (CLIENT, trace=t1, span=s1) # FROM MCP Client
|
--- toolbox/server/mcp/http (SERVER, trace=t1, span=s2, parent=s1) # IN TOOLBOX
     |
     --- initialize (SERVER, trace=t1, span=s3, parent=s2) # IN TOOLBOX

HTTP tool call:

tools/call get-weather (CLIENT, trace=t1, span=s1) # FROM MCP Client
|
--- toolbox/server/mcp/http (SERVER, trace=t1, span=s2, parent=s1) # IN TOOLBOX
     |
     --- tools/call get-weather (SERVER, trace=t1, span=s3, parent=s2) # IN TOOLBOX

If this looks good to you, feel free to update this PR, thank you so much once again!
And yes, once this is done, we can then align the metrics with the conventions :)

@AjmeraParth132
Copy link
Author

That's great @Yuan325!

1,2,4 - perfectly aligned
3 - yes, will check with few mcp clients as well if they are consistently passing parent trace information in the meta object and will add them in our server telemetry
5 - correct. I think it is better to put that in attribute(a) as there's no written convention around toolsets, but either way is fine.

Yes, the flows are exactly what they should be! Will start working on the PR 🚀

@Yuan325
Copy link
Contributor

Yuan325 commented Jan 23, 2026

@AjmeraParth132 awesome, thank you! Will take a look when Let's add the toolset name in attribute then 👍

I'll make sure to prioritize for review once this is updated! Thank you so much again :)

@Yuan325 Yuan325 added priority: p2 Moderately-important priority. Fix may not be included in next release. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Feb 3, 2026
@Yuan325 Yuan325 removed the priority: p2 Moderately-important priority. Fix may not be included in next release. label Feb 3, 2026
@AjmeraParth132
Copy link
Author

Hey @Yuan325, updated the PR with the required changes, lmk your thoughts.

@Yuan325
Copy link
Contributor

Yuan325 commented Feb 7, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces detailed telemetry for MCP sessions over HTTP and STDIO transports, aiming to improve observability by propagating W3C trace context and enriching spans. However, several security vulnerabilities have been identified, including a potential Denial of Service due to unrestricted request body reading, an SSE injection vulnerability in the SSE handler, and telemetry injection via untrusted input in span names. Additionally, there are performance concerns and code duplication due to the request body being unmarshalled multiple times across different functions. Addressing these security and performance issues will enhance the robustness and efficiency of the telemetry implementation.

Copy link
Contributor

@Yuan325 Yuan325 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating this!! Left some feedback for code updates :)

@AjmeraParth132
Copy link
Author

Thanks @Yuan325 for the detailed feedback, rightly agreed. Incorporated the above requested changes to the PR.

Copy link
Contributor

@Yuan325 Yuan325 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay, last 2 comments! Overall LGTM! I'll merge this after the 2 comments is resolved. Thank you once again!! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err = s.write(msgCtx, res); err != nil {

Comment on lines 581 to 585
// Check if message is a notification
if baseMessage.Id == nil {
err := mcp.NotificationHandler(ctx, body)
return "", nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to move this to after setting the span attributes since span are added before notification handling? Also will appreciate if you can setStatus as error if there's error.

		if err != nil {
			span.SetStatus(codes.Error, err.Error())
		}
        return "", nil, err

@Yuan325
Copy link
Contributor

Yuan325 commented Feb 11, 2026

/gcbrun

@AjmeraParth132
Copy link
Author

Perfect! Pushed the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants