Skip to content

Tracing Context Breaks When onNext() Method Invokes Multiple Times for a single request #12317

@NaveenRamu

Description

@NaveenRamu

Describe the bug

When using the Subscriber interface with OpenTelemetry (Otel) tracing in a reactive environment, the tracing continuity breaks when the onNext() method is invoked multiple times (e.g., during message processing in chunks). The trace context is lost, leading to incomplete or broken traces. This is likely because the tracing context, which is stored in thread-local variables, is not being propagated correctly across asynchronous boundaries. and generate a new tracing ID for the next invoke.

Similarly when the onComplete() method is invoked at the end of the request new trace ID will be generated.

Implementation Class example:
`public abstract class AbstractSubscriber implements Subscriber {

@OverRide
public void onSubscribe(Subscription subscription) {
logger.debug("onSubscribe");
this.subscription = subscription;
subscription.request(1);
}

@OverRide
public void onNext(ByteBuf byteBuf) {
logger.debug("onNext");
process(byteBuf);
if (!isError) {
byteBuf.release();
subscription.request(1);
} else {
byteBuf.release();
subscription.cancel();
ctx.error(cause);
}
}

@OverRide
public void onError(Throwable throwable) {
logger.debug("onError");
ctx.error(throwable);
}

@OverRide
public void onComplete() {
requestComplete();
}

}`

Steps to reproduce

Steps to Reproduce:

Implement a Subscriber that processes messages in chunks by invoking onNext() multiple times.
Set up OpenTelemetry tracing to trace the process.
Notice that after multiple onNext() calls, the trace context is lost, leading to incomplete traces in the distributed trace logs and generating a new trace ID.

Expected behavior

The trace context should propagate correctly across multiple invocations of onNext(), maintaining continuity in the tracing logs.

Actual behavior

New thread ids generating for the RatPack override methods like onNext(). OnError(), onComplete().

Javaagent or library instrumentation version

otelcol version 0.106.1

Environment

JDK: 1.8
OS: CentOs
Server: Ratpack (version: Ratpack 2.0.0-rc-1)

Additional context

The issue is likely due to asynchronous execution switching threads and losing the thread-local trace context. Reactive systems typically involve multiple threads, and if the context is not propagated, traces may appear incomplete or broken.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds reproneeds triageNew issue that requires triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions