-
Notifications
You must be signed in to change notification settings - Fork 137
Opentelemetry baggage propagation fix #1174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Change the activity interceptor to use context.attach()/detach() pattern instead of passing context as a parameter to start_as_current_span(). The fix follows the standard OpenTelemetry pattern used by other instrumentations (django, gRPC, etc.) and ensures proper context management with try/finally for detach.
Add additional tests to verify baggage propagation in scenarios: - multiple values - local activity - retries in activity
Roman Konoval seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Roman Konoval seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
temporalio/contrib/opentelemetry.py
Outdated
extracted_ctx = self.root._context_from_headers(input.headers) | ||
|
||
if extracted_ctx: | ||
token = opentelemetry.context.attach(extracted_ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just move this attach/detach into _start_as_current_span
to happen always? If we don't want it always, we can add an option on that call. But I think we still want to pass context
to OTel's start_as_current_span
call right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the attach the current context alread has all the information from the headers (including tracing information) so there is no need to pass it to context
to start_as_current_span
.
But I think it makes sense to modify _start_as_current_span
to do attach if the context was provided (that would be the case for inbound scenarios). I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked that in official instrumentations they do pass context to start_as_current_span
so I implemented both your suggestions.
5905dcf
to
3fd9236
Compare
Two important edge case tests: - exceptions handling - when no current context is available
3fd9236
to
411c6e3
Compare
Status( | ||
status_code=StatusCode.ERROR, | ||
description=f"{type(exc).__name__}: {exc}", | ||
if context: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer a ternary here, but that's a nit.
raise | ||
finally: | ||
if token: | ||
opentelemetry.context.detach(token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VegetarianOrc we should evaluate how this intersects with your changes to protect detach elsewhere.
) | ||
|
||
|
||
@activity.defn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually put the activity/workflow definitions with the test they support if they are only used for one. In this case I would just move them below the one benign exception test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually potentially break them up to go with their respective tests.
), "tenant.id baggage should propagate to activity" | ||
|
||
|
||
async def test_opentelemetry_baggage_propagation_multiple_values( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really provide additional coverage compared to _basic? That already reads two baggages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'll remove it.
async def test_opentelemetry_baggage_propagation_local_activity( | ||
client: Client, env: WorkflowEnvironment | ||
): | ||
exporter = InMemorySpanExporter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the exporter
is needed for many(any?) of these tests since you aren't actually checking anything on outgoing spans. I could be wrong though, does the baggage not happen otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed exporter is not needed, for baggage to work only tracer with a TracerProvider
is required.
assert all(v == "test-user-retry" for v in retry_attempt_baggage_values) | ||
|
||
|
||
async def test_opentelemetry_baggage_exception_handling( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what this test is trying to validate. It seems to be nothing more than the basic test, but with exceptions to assert truth instead of returning the value. What benefit does it give?
/sdk-python.iml | ||
/.zed | ||
*.DS_Store | ||
tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on or remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a ctags file used for source code navigation e.g. in vim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two other things you'll have to do. First, poe lint
is failing. Second, your commits have multiple email addresses associated:

You'll need to add the second email to your github account so that the CLA is signed for all the commits. Either that or modify the commits to the correct email in some way.
What was changed
Fixed OpenTelemetry baggage propagation in the inbound interceptor by explicitly attaching the extracted context before starting spans. Changed from passing
context=extracted_ctx
as a parameter to usingcontext.attach(extracted_ctx)
+context.detach(token)
.Why?
The previous implementation used
start_as_current_span(context=extracted_ctx)
which only uses the provided context to determine the parent span for trace propagation. When building the new span context, OpenTelemetry always usescontext.get_current()
that is the active context from the stack, not thecontext=
parameter. But the active context is not set from the unpacked values received in headers.This meant that while trace parent-child relationships worked correctly, baggage values from the extracted context were not copied into the new span context, making them unavailable within Temporal activities/workflows.
By calling
context.attach(extracted_ctx)
first, we make the extracted context active on the context stack. This ensures that when the new span context is created, it copies all data (including baggage) from the extracted context, properly propagating baggage across service boundaries.This aligns with the standard pattern used by other OpenTelemetry instrumentations (django, gRPC, etc.).
Checklist
Closes [Feature Request] Make sure OTel baggage propagates properly throughout activities #362
How was this tested:
test_opentelemetry_baggage_propagation_basic
test does this, namely:Any docs updates needed?
I don't think so.