-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Description
Recently, we migrated from the newrelic format to otel.
A few of our busier sites had issues where metrics were being 'silently' dropped without anything showing in debug logs.
Steps to Reproduce
docker run -d --name ktranslate-otel --restart unless-stopped --user id -u:id -g -v /opt/newrelic-npm/snmp.yaml:/snmp-base.yaml -e AWS_ACCESS_KEY_ID=123 -e AWS_SECRET_ACCESS_KEY=456 -e AWS_REGION=region -e "OTEL_METRIC_EXPORT_INTERVAL=30000" -e "OTEL_EXPORTER_OTLP_COMPRESSION=gzip" store.sfdcbt.net/kentik/ktranslate:kt-2025-10-31-18983550174 -format=otel -format_metric=otel -otel.protocol=http -otel.endpoint=https://otel-collector.url:4318 -snmp /snmp-base.yaml -log_level=debug -metrics=jchf -snmp_discovery_on_start=true -snmp_discovery_min=180 -service_name=debug-otel
Tried so far
Adding these config's fixed issues for many sites but not all
- Increased memory for the container
- Added these env var's:
OTEL_EXPORTER_OTLP_COMPRESSION=gzip
OTEL_METRIC_EXPORT_INTERVAL=30000
Proposal
Would love to see some debug logging for when internal otel export limits are hit.
e.g. otel.go -> line 270 and 300
replace
f.inputs[m.Name] <- m
with
ch := f.inputs[m.Name]
queueDepth := len(ch)
if queueDepth >= CHAN_SLACK {
f.Debugf("Channel queue at CHAN_SLACK limit for %s: %d/%d (100%%)", m.Name, queueDepth, CHAN_SLACK)
}
ch <- m