feat: Add thread context propagation for span-profile correlation in thread pools#45
feat: Add thread context propagation for span-profile correlation in thread pools#45dordor12 wants to merge 2 commits intografana:mainfrom
Conversation
|
Tested this change using trace-span-profiles-example - a complete setup designed to verify span-profile correlation across thread pools. Test ScenarioThe demo app includes a The Problem SetupSpans are created on worker threads (not the request thread) using an private final ExecutorService executor = Executors.newFixedThreadPool(8);
@GetMapping("/worker-spans")
public SpanTestResult testWorkerSpans(...) {
for (int i = 0; i < count; i++) {
final Context parentContext = Context.current(); // Capture parent context
Future<SpanInfo> future = executor.submit(() -> {
// Propagate context to worker thread
try (Scope scope = parentContext.makeCurrent()) {
return executeSpanOnWorkerThread(spanIndex, durationMs);
}
});
}
}
Each worker thread creates a child span and does CPU-intensive work to generate profiler samples:
private SpanInfo executeSpanOnWorkerThread(int spanIndex, int durationMs) {
Span span = tracer.spanBuilder("worker-span-" + spanIndex).startSpan();
try (Scope scope = span.makeCurrent()) {
// CPU work generates profiler samples that should be attributed to this span
while (System.nanoTime() - startTime < targetNanos) {
sum = cpuIntensiveWork(sum);
}
} finally {
span.end();
}
}
Why This Tests the Fix
Before this PR: When parentContext.makeCurrent() runs on the worker thread, OTel's ThreadLocal context is set, but async-profiler's native pthread TLS is NOT synchronized. Profiling samples are incorrectly attributed.
After this PR: The ProfilingContextStorage wrapper intercepts makeCurrent() and synchronizes the native TLS:
// ProfilingContextStorage.java
public Scope attach(Context toAttach) {
Scope scope = delegate.attach(toAttach);
// Sync native TLS with OTel context
setTracingContextForSpan(spanId, spanName);
return () -> {
scope.close();
// Restore previous context in native TLS
restoreTracingContext();
};
}Results
Test endpoint: GET /api/span-test/worker-spans?count=5&durationMs=5
|
| Context currentContext = storage.current(); | ||
| long spanId = extractSpanId(currentContext); | ||
| if (asprof != null) { | ||
| asprof.setTracingContext(spanId, 0); |
There was a problem hiding this comment.
Good catch - this was a bug. Fixed it to extract the parent span's spanNameId instead of hardcoding 0
- Add ProfilingContextStorage wrapper to synchronize OTel context with async-profiler native pthread TLS on every context switch - Fix onEnd() to restore parent span context instead of clearing to (0,0) - Add otel.pyroscope.context.propagation.enabled config flag (default: true) - Add PyroscopeBootstrapConfig for bootstrap classloader configuration Fixes grafana#44
69ba017 to
9d5b505
Compare
|
Tested end-to-end with async-profiler#11 and jfr-parser#82. Wall profiles now correctly show |
|
Hey, I am not familiar with otel / |
| * | ||
| * This ensures profiling samples are correctly associated with spans, | ||
| * even when work is executed on different threads via executors. | ||
| */ |
There was a problem hiding this comment.
lets mark it "experimental". Make sure this API is experimental and we have not commited to maintaining it and it may change or be removed in the future
There was a problem hiding this comment.
Done. Added EXPERIMENTAL warning to javadoc.
|
|
||
| @Override | ||
| public Scope attach(Context toAttach) { | ||
| // 1. Extract span ID and name from the context being attached |
There was a problem hiding this comment.
remove all these useless llm comments
// foo
foo()
| /** | ||
| * Parse 16-character hex span ID to long. | ||
| */ | ||
| private static long parseHexSpanId(String hexSpanId) { |
There was a problem hiding this comment.
is it any different from the function we have in span processor
There was a problem hiding this comment.
You're right, it's identical. Changed to use PyroscopeOtelSpanProcessor.parseSpanId() instead.
| import io.pyroscope.vendor.one.profiler.AsyncProfiler; | ||
|
|
||
| /** | ||
| * ContextStorage wrapper that synchronizes OTel context with async-profiler's |
There was a problem hiding this comment.
is this an alternative to span processor or should they be used together?
There was a problem hiding this comment.
Used together. SpanProcessor handles span start/end on the originating thread. This ContextStorage wrapper handles context propagation to other threads (e.g., via executors).
| * native pthread thread-local storage on every context switch. | ||
| * | ||
| * This ensures profiling samples are correctly associated with spans, | ||
| * even when work is executed on different threads via executors. |
There was a problem hiding this comment.
- even when work is executed on different threads via executors.
how is this achieved
There was a problem hiding this comment.
When OTel propagates context to another thread (e.g., Context.wrap(Runnable)), it calls attach() on that thread. We hook into that to update native TLS on the new thread.
|
please include a Co-authored clause if this is generated by an LLM |
|
Added Co-authored-by clause to the commit. |
- Mark API as EXPERIMENTAL - Clarify relationship with SpanProcessor in javadoc - Remove redundant comments - Reuse parseSpanId from PyroscopeOtelSpanProcessor Co-authored-by: Claude <noreply@anthropic.com>
2016aba to
96c5dd4
Compare
|
Hi @korniltsev-grafanista , just checking in to see if you've had a chance to look at the experimental tag and the code cleanup I added. Please let me know if there’s any other context I can provide to help with the review! Thanks |
I did not. I will take a look once I have time |

Fixes #44
Summary
ProfilingContextStoragewrapper to synchronize OTel context with async-profiler native TLSonEnd()to restore parent span context instead of clearingotel.pyroscope.context.propagation.enabledconfiguration flag (default:true)Problem
When spans are used across thread pools/executors, profiling samples were incorrectly attributed because:
Also, when child spans ended, the profiling context was cleared entirely instead of restoring to the parent span.
Changes
ProfilingContextStorage.java(new)ContextStorage wrapper that synchronizes OTel context with async-profiler's native pthread TLS on every context switch:
attach(): extracts span ID/name from OTel context and sets in async-profiler viasetTracingContext()close(): restores profiling context to match the now-current OTel contextPyroscopeOtelSpanProcessor.javaonEnd()now restores parent span context if valid and localPyroscopeOtelConfiguration.javacontextPropagationEnabledfield (default:true)PyroscopeOtelAutoConfigurationCustomizerProvider.javaotel.pyroscope.context.propagation.enabledconfigProfilingContextStoragewrapper when enabledConfiguration
otel.pyroscope.context.propagation.enabledOTEL_PYROSCOPE_CONTEXT_PROPAGATION_ENABLEDtrueTest Plan