-
|
"Hi Jaeger Team, I am deploying Jaeger in a Kubernetes cluster with a large number of microservices/applications that want to send traces. We intend to have all applications send traces directly to the central jaeger-collector Service (Deployment). My primary concern is high load:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
|
Hi @Vail-X! I'm Dosu and I’m helping the jaeger team. A default Jaeger Collector setup is likely to be overwhelmed if you have tens or hundreds of applications sending traces directly, because the default queue size (2000) and worker count (50) are tuned for modest workloads. Under high load, the collector can drop spans when its internal queue fills up, and you may see errors or increased latency. You can tune these parameters—queue size, number of workers, and enable dynamic queue sizing—to increase ingestion capacity, but you’ll also need to provision enough CPU and memory resources for the collector pods [source]. Horizontal scaling (increasing replicaCount) is recommended, since the collector is stateless and can be scaled out to distribute load. In Kubernetes, use a Service to load-balance traffic across collector pods. For very high throughput, you may need to tune queue and batch settings, and monitor metrics for dropped spans and queue health [source]. Since jaeger-agent is deprecated, the recommended pattern is to use the OpenTelemetry Collector as a buffering and batching layer between your applications and Jaeger Collector. This intermediary can handle batching, retries, and tail-based sampling, and helps prevent overload on the Jaeger backend. The modern architecture is: apps → OpenTelemetry Collector (OTLP) → Jaeger Collector → storage [source], [source]. Monitor metrics like Real-world deployments have handled hundreds of services and hundreds of GBs of trace data per day, but only with careful tuning and scaling [source]. Adaptive sampling is not fully supported with OpenTelemetry SDKs yet, so sampling strategies may be limited in high-load scenarios. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
Is it recoomended to run OpenTelemetry Collector as daemonset or deployment? |
Beta Was this translation helpful? Give feedback.
Hi @Vail-X! I'm Dosu and I’m helping the jaeger team.
A default Jaeger Collector setup is likely to be overwhelmed if you have tens or hundreds of applications sending traces directly, because the default queue size (2000) and worker count (50) are tuned for modest workloads. Under high load, the collector can drop spans when its internal queue fills up, and you may see errors or increased latency. You can tune these parameters—queue size, number of workers, and enable dynamic queue sizing—to increase ingestion capacity, but you’ll also need to provision enough CPU and memory resources for the collector pods [source].
Horizontal scaling (increasing replicaCount) is recommended, since the…