-
Notifications
You must be signed in to change notification settings - Fork 940
Description
Describe the bug
When using OtlpGrpcSpanExporter with the gRPC Netty transport (grpc-netty or grpc-netty-shaded), the exporter creates unbounded grpc-default-worker threads over time, leading to memory exhaustion and eventual OOM.
Steps to reproduce
- Configure
OtlpGrpcSpanExporterusing the managed channel gRPC sender (i.e., excludeopentelemetry-exporter-sender-okhttpand addopentelemetry-exporter-sender-grpc-managed-channel) - Run the application under normal load
- Monitor thread count over time
What did you expect to see?
A stable, bounded number of gRPC/Netty worker threads.
What did you see instead?
Thread count grows continuously. In our case, we observed 5-7 new grpc-default-worker threads created approximately every 10 seconds, with none of them terminating. Each thread uses ~1MB of stack space, leading to significant memory growth (~2.7GB/hour in native memory).
Root cause analysis
When using ManagedChannelBuilder.forTarget() without explicitly configuring an event loop group, Netty defaults to using ThreadPerTaskExecutor for its internal worker threads. Unlike a bounded thread pool, this executor creates a new thread for each task and does not reuse threads.
The GrpcExporterBuilder in opentelemetry-java creates a channel via ManagedChannelBuilder but does not configure:
- A bounded
EventLoopGroupviaNettyChannelBuilder.eventLoopGroup() - Or limit the channel's internal threading behavior
Suggested fix
When building the managed channel for Netty transport, use NettyChannelBuilder directly with a bounded NioEventLoopGroup:
NioEventLoopGroup eventLoopGroup = new NioEventLoopGroup(2); // or some reasonable bounded number
NettyChannelBuilder.forTarget(endpoint)
.eventLoopGroup(eventLoopGroup)
.channelType(NioSocketChannel.class)
// ... other configuration
.build();This ensures Netty reuses a fixed pool of event loop threads rather than creating unbounded new threads.
Environment
- OS: Linux (containers)
- Java version: 21
- OpenTelemetry version: 1.38.x
- gRPC version: 1.78.0
- Transport: grpc-netty-shaded
Additional context
This issue is distinct from:
- OtlpGrpcExporter/Netty still active after
SdkTracerProvider#shutdown()#3521 (threads active after shutdown) - that was about cleanup, this is about unbounded creation during normal operation - DNS creates unbounded number of grpc-default-executor threads, and can't be overriden grpc/grpc-java#3090 (unbounded DNS executor threads) - that was about the offload executor, this is about Netty's internal event loop threads
The default OkHttp sender may not exhibit this behavior, but users who switch to the managed channel sender with Netty transport will encounter it.