-
Notifications
You must be signed in to change notification settings - Fork 781
Description
Prometheus exporter creates duplicate HELP/TYPE declarations for metrics with different label sets
Description
The Prometheus exporter creates multiple HELP and TYPE declarations for the same metric when data points have different sets of labels. This violates the Prometheus text format specification and causes metrics to be rejected by Prometheus Pushgateway with the error:
400 Bad Request - text format parsing error in line X: second HELP line for metric name "metric_name"
Current Behavior
When exporting metrics where optional attributes create varying label sets (e.g., error_type only present on errors, server_address only present for some requests), the exporter produces:
# HELP http_client_request_duration_seconds Duration of HTTP client requests.
# TYPE http_client_request_duration_seconds histogram
http_client_request_duration_seconds_bucket{http_request_method="PUT",http_response_status_code="200",le="0.005",network_protocol_version="1.1"} 0.0
...
# HELP http_client_request_duration_seconds Duration of HTTP client requests. ← DUPLICATE!
# TYPE http_client_request_duration_seconds histogram ← DUPLICATE!
http_client_request_duration_seconds_bucket{error_type="HTTPError",http_request_method="PUT",http_response_status_code="400",le="0.005",network_protocol_version="1.1"} 0.0
...
# HELP http_client_request_duration_seconds Duration of HTTP client requests. ← DUPLICATE!
# TYPE http_client_request_duration_seconds histogram ← DUPLICATE!
http_client_request_duration_seconds_bucket{http_request_method="GET",http_response_status_code="200",le="0.005",network_protocol_version="1.1",server_address="metadata.google.internal"} 0.0
...
Expected Behavior
All time series for the same metric should be consolidated under a single HELP/TYPE declaration:
# HELP http_client_request_duration_seconds Duration of HTTP client requests.
# TYPE http_client_request_duration_seconds histogram
http_client_request_duration_seconds_bucket{http_request_method="PUT",http_response_status_code="200",le="0.005",network_protocol_version="1.1"} 0.0
...
http_client_request_duration_seconds_bucket{error_type="HTTPError",http_request_method="PUT",http_response_status_code="400",le="0.005",network_protocol_version="1.1"} 0.0
...
http_client_request_duration_seconds_bucket{http_request_method="GET",http_response_status_code="200",le="0.005",network_protocol_version="1.1",server_address="metadata.google.internal"} 0.0
...
Root Cause
The exporter creates a metric family ID that includes label keys:
per_metric_family_id = "|".join([
metric_name,
metric_description,
"%".join(label_keys), # ← This causes separate families for different label sets
metric_unit,
])This means metrics with the same name but different label keys are treated as separate metric families, each getting their own HELP/TYPE declarations.
Proposed Solution
Follow the approach used in opentelemetry-go PR #3469:
- Use only the metric name as the metric family identifier (not including label keys)
- Cache the first description and type encountered for each metric name
- Consolidate all data points under a single metric family regardless of label variations
- Fill missing labels with empty strings to maintain consistent label sets within the family
Reference Implementation
The Go implementation solved this in open-telemetry/opentelemetry-go#3469 by:
metricFamilies map[string]*dto.MetricFamily
// When encountering a metric:
// 1. If first time seeing this name → store it with TYPE and HELP
// 2. If TYPE conflicts → DROP the entire metric (per spec)
// 3. If HELP/UNIT conflicts → USE first description (per spec)
// 4. Add all data points to the same familyAccording to the OpenTelemetry spec:
"Exporters MUST drop entire metrics to prevent conflicting TYPE comments, but SHOULD NOT drop metric points as a result of conflicting UNIT or HELP comments."
Impact
This issue affects:
- Prometheus Pushgateway: Strictly validates text format and rejects metrics with duplicate HELP/TYPE
- Some strict parsers: Telegraf and other tools that validate Prometheus format
- OpenTelemetry semantic conventions: Many attributes are optional (e.g.,
error_type,server.address), which naturally creates varying label sets
Environment
- OpenTelemetry Python SDK: 1.38.0
- OpenTelemetry Auto-instrumentation: 0.59b0
- Prometheus Pushgateway: v1.4.2
Related Issues
- Prometheus exporter should re-use the metric family #2628 - Fixed duplicate HELP/TYPE for same label keys, different values (PR Reuse metric family #2639)
- Prometheus exporter undefined behaviour when not all labels are set #3391 - Prometheus exporter undefined behavior when not all labels are set (PR prometheus-exporter: fix labels out of place for data points with different attribute sets #4413)
- opentelemetry-exporter-prometheus - - metrics out of place for different attribute sets #4418 - Metrics out of place for different attribute sets (PR prometheus-exporter: fix labels out of place for data points with different attribute sets #4413)
Note: PR #4413 fixed label value misalignment but did not address the duplicate HELP/TYPE declarations.
Prometheus Specification
From the Prometheus exposition format spec:
"Only one TYPE line and one HELP line may exist for any given metric name."
From prometheus/docs#547:
"The specification is crystal clear that duplicate metric families (and, correspondingly, multiple TYPE and HELP lines for the same metric name in the text format) are not allowed in a single exposition."
Additional Context
This is particularly problematic for HTTP instrumentation where:
error_typeis only present when status code >= 400server_addressis only present for certain requests- Different HTTP methods may have different optional attributes
The current workaround is to post-process the metrics output to deduplicate HELP/TYPE declarations before pushing to Pushgateway, but this should be handled correctly by the exporter itself.