Skip to content

feat: Add thread context propagation for span-profile correlation in thread pools#45

Open
dordor12 wants to merge 2 commits intografana:mainfrom
dordor12:feat/thread-context-propagation
Open

feat: Add thread context propagation for span-profile correlation in thread pools#45
dordor12 wants to merge 2 commits intografana:mainfrom
dordor12:feat/thread-context-propagation

Conversation

@dordor12
Copy link

@dordor12 dordor12 commented Jan 9, 2026

Fixes #44

Summary

  • Add ProfilingContextStorage wrapper to synchronize OTel context with async-profiler native TLS
  • Fix onEnd() to restore parent span context instead of clearing
  • Add otel.pyroscope.context.propagation.enabled configuration flag (default: true)

Problem

When spans are used across thread pools/executors, profiling samples were incorrectly attributed because:

  1. OTel context propagates via Java ThreadLocals
  2. Async-profiler uses native pthread TLS for span correlation
  3. These were not synchronized on thread context switches

Also, when child spans ended, the profiling context was cleared entirely instead of restoring to the parent span.

Changes

ProfilingContextStorage.java (new)

ContextStorage wrapper that synchronizes OTel context with async-profiler's native pthread TLS on every context switch:

  • On attach(): extracts span ID/name from OTel context and sets in async-profiler via setTracingContext()
  • On scope close(): restores profiling context to match the now-current OTel context

PyroscopeOtelSpanProcessor.java

  • onEnd() now restores parent span context if valid and local
  • Only clears to (0,0) when root span ends

PyroscopeOtelConfiguration.java

  • Added contextPropagationEnabled field (default: true)

PyroscopeOtelAutoConfigurationCustomizerProvider.java

  • Reads otel.pyroscope.context.propagation.enabled config
  • Registers ProfilingContextStorage wrapper when enabled

Configuration

Property Environment Variable Default
otel.pyroscope.context.propagation.enabled OTEL_PYROSCOPE_CONTEXT_PROPAGATION_ENABLED true

Test Plan

  • Verify context propagation works with thread pools
  • Verify parent span context restored when child span ends
  • Verify feature can be disabled via configuration

@dordor12
Copy link
Author

dordor12 commented Jan 9, 2026

Tested this change using trace-span-profiles-example - a complete setup designed to verify span-profile correlation across thread pools.

Test Scenario

The demo app includes a SpanProfileTestController that specifically tests the multithread context propagation problem:

The Problem Setup

Spans are created on worker threads (not the request thread) using an ExecutorService:

private final ExecutorService executor = Executors.newFixedThreadPool(8);

@GetMapping("/worker-spans")
public SpanTestResult testWorkerSpans(...) {
    for (int i = 0; i < count; i++) {
        final Context parentContext = Context.current(); // Capture parent context

        Future<SpanInfo> future = executor.submit(() -> {
            // Propagate context to worker thread
            try (Scope scope = parentContext.makeCurrent()) {
                return executeSpanOnWorkerThread(spanIndex, durationMs);
            }
        });
    }
}

Each worker thread creates a child span and does CPU-intensive work to generate profiler samples:

private SpanInfo executeSpanOnWorkerThread(int spanIndex, int durationMs) {
    Span span = tracer.spanBuilder("worker-span-" + spanIndex).startSpan();

    try (Scope scope = span.makeCurrent()) {
        // CPU work generates profiler samples that should be attributed to this span
        while (System.nanoTime() - startTime < targetNanos) {
            sum = cpuIntensiveWork(sum);
        }
    } finally {
        span.end();
    }
}

Why This Tests the Fix

Before this PR: When parentContext.makeCurrent() runs on the worker thread, OTel's ThreadLocal context is set, but async-profiler's native pthread TLS is NOT synchronized. Profiling samples are incorrectly attributed.

After this PR: The ProfilingContextStorage wrapper intercepts makeCurrent() and synchronizes the native TLS:

// ProfilingContextStorage.java
public Scope attach(Context toAttach) {
    Scope scope = delegate.attach(toAttach);
    // Sync native TLS with OTel context
    setTracingContextForSpan(spanId, spanName);
    return () -> {
        scope.close();
        // Restore previous context in native TLS
        restoreTracingContext();
    };
}

Results

  • ✅ Worker thread spans correctly attributed in profiles
  • ✅ Parent span context restored when child span ends (not cleared to 0)
  • ✅ Span-to-profile linking works in Grafana for thread pool scenarios

Test endpoint: GET /api/span-test/worker-spans?count=5&durationMs=5

image

Context currentContext = storage.current();
long spanId = extractSpanId(currentContext);
if (asprof != null) {
asprof.setTracingContext(spanId, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - this was a bug. Fixed it to extract the parent span's spanNameId instead of hardcoding 0

- Add ProfilingContextStorage wrapper to synchronize OTel context with
  async-profiler native pthread TLS on every context switch
- Fix onEnd() to restore parent span context instead of clearing to (0,0)
- Add otel.pyroscope.context.propagation.enabled config flag (default: true)
- Add PyroscopeBootstrapConfig for bootstrap classloader configuration

Fixes grafana#44
@dordor12 dordor12 force-pushed the feat/thread-context-propagation branch from 69ba017 to 9d5b505 Compare February 7, 2026 20:30
@dordor12
Copy link
Author

dordor12 commented Feb 7, 2026

Tested end-to-end with async-profiler#11 and jfr-parser#82. Wall profiles now correctly show span_name labels and "Profiles for this span" works in Grafana.

@korniltsev-grafanista
Copy link
Contributor

Hey, I am not familiar with otel / ContextStorage. It will take some time for me to get the context and review.

*
* This ensures profiling samples are correctly associated with spans,
* even when work is executed on different threads via executors.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets mark it "experimental". Make sure this API is experimental and we have not commited to maintaining it and it may change or be removed in the future

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added EXPERIMENTAL warning to javadoc.


@Override
public Scope attach(Context toAttach) {
// 1. Extract span ID and name from the context being attached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all these useless llm comments

// foo
foo()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

/**
* Parse 16-character hex span ID to long.
*/
private static long parseHexSpanId(String hexSpanId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it any different from the function we have in span processor

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's identical. Changed to use PyroscopeOtelSpanProcessor.parseSpanId() instead.

import io.pyroscope.vendor.one.profiler.AsyncProfiler;

/**
* ContextStorage wrapper that synchronizes OTel context with async-profiler's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an alternative to span processor or should they be used together?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used together. SpanProcessor handles span start/end on the originating thread. This ContextStorage wrapper handles context propagation to other threads (e.g., via executors).

* native pthread thread-local storage on every context switch.
*
* This ensures profiling samples are correctly associated with spans,
* even when work is executed on different threads via executors.
Copy link
Contributor

@korniltsev-grafanista korniltsev-grafanista Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • even when work is executed on different threads via executors.

how is this achieved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When OTel propagates context to another thread (e.g., Context.wrap(Runnable)), it calls attach() on that thread. We hook into that to update native TLS on the new thread.

@korniltsev-grafanista
Copy link
Contributor

please include a Co-authored clause if this is generated by an LLM

@dordor12
Copy link
Author

dordor12 commented Feb 8, 2026

Added Co-authored-by clause to the commit.

@CLAassistant
Copy link

CLAassistant commented Feb 8, 2026

CLA assistant check
All committers have signed the CLA.

- Mark API as EXPERIMENTAL
- Clarify relationship with SpanProcessor in javadoc
- Remove redundant comments
- Reuse parseSpanId from PyroscopeOtelSpanProcessor

Co-authored-by: Claude <noreply@anthropic.com>
@dordor12
Copy link
Author

Hi @korniltsev-grafanista , just checking in to see if you've had a chance to look at the experimental tag and the code cleanup I added. Please let me know if there’s any other context I can provide to help with the review! Thanks

@korniltsev-grafanista
Copy link
Contributor

Hi @korniltsev-grafanista , just checking in to see if you've had a chance to look

I did not. I will take a look once I have time

@korniltsev-grafanista korniltsev-grafanista requested a review from a team February 17, 2026 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Profiling samples not correctly attributed to spans when using thread pools

3 participants

Comments