-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Describe the bug
When using KtorServerTracing on a server with Netty engine, in combination with a route that uses Ktor's HttpClient, tracing of incoming requests will start to fail. The result is that if the host computer or JVM has N processors available, your trace collector will receive up to N SERVER traces, regardless of how many requests you make to the server.
There seems to be context leak which causes a given thread to be associated with an OpenTelemetry context, and therefore also a span. If a given thread has already processed an incoming request, it will from that point on be associated with the span that was created the first time. On subsequent requests processed by the same thread, it will not create new SERVER spans.
The bug causes the shouldSuppress function to return true.
That function is called from here.
Note that changing the engine to CIO seems to result in correct tracing.
Also note that removing the use of a HttpClient inside the route will also stop the bug from happening.
As such, it seems the bug is only triggered when using the combination of Netty server engine and Ktor's HttpClient inside a route.
Note: Please let me know if you think this is the wrong place for this bug report. I am not sure if the error is with Netty or with opentelemetry-java-instrumentation.
Steps to reproduce
I have made a minimal reproducible case in a separate repo. It has instructions on how to run it:
https://github.com/LangdalP/ktor-otel-debug
Expected behavior
Each request should result in a new trace, given that the incoming traceparent-header is different each time (or non-existing).
Actual behavior
If the server has one processor, it will correctly trace up to one request, and subsequent requests will not be traced correctly since there are no new SERVER spans created.
Here is the only trace created and collected:

Note that changing the engine to CIO will correctly result in 5 requests being traced.
Also note that removing the use of a HttpClient inside the route will also result in 5 requests being traced.
Javaagent or library instrumentation version
2.2.0-alpha, sdk-version 1.36.0
Environment
JDK: Eclipse Temurin JDK 17
OS: MacOS and docker-image eclipse-temurin:17-alpine
Additional context
Note that the README for the Ktor server instrumentation suggests using Netty, which is a bad idea due to this bug:
https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/ktor/ktor-2.0/library
Please let me know if you think I should instead create a bug report with Ktor/Netty.
