Skip to content

Conversation

incertum
Copy link
Contributor

@incertum incertum commented Aug 5, 2025

Previously, metrics were identified by their name. This change allows a shared metric name per metric type, with each unique set of labels defining a distinct time series.

Additionally, the internal data structure for a stored metric has been refactored, providing a more robust and programmatic representation.

CC @FranzBusch, @ktoso, @blindspotbounty

Let me know how you'd prefer to proceed. We have two main options:

  1. I can split this work into a few smaller, more focused PRs.
  2. We can finalize the overall approach in this PR, and I can push the remaining changes here next week.

Fixes #125 opened by @blindspotbounty

Copy link
Collaborator

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall ok, but the names are really confusing. Perhaps: keep Metric a type, drop MetricContainer, doesn't seem to have anything specific it's adding -- just a single field, and then rename "MetricType" to be the Container type?

@incertum incertum force-pushed the refactor/metrics-storage-enable-shared-names-same-type branch 2 times, most recently from a3bb1e5 to fd0e483 Compare August 18, 2025 23:39
Previously, counters could only be identified by their name. This change allows multiple counters to share the same name, with each unique set of labels defining a distinct time series.

Additionally, the internal data structure for a stored metric has been refactored, providing a more robust and programmatic representation.

Signed-off-by: Melissa Kilby <[email protected]>
@incertum incertum force-pushed the refactor/metrics-storage-enable-shared-names-same-type branch from fd0e483 to c697ee8 Compare August 19, 2025 19:24
Previously, gauges could only be identified by their name. This change allows multiple gauges to share the same name, with each unique set of labels defining a distinct time series.

Additionally, the internal data structure for a stored metric has been refactored, providing a more robust and programmatic representation.

Signed-off-by: Melissa Kilby <[email protected]>
@incertum incertum changed the title wip: refactor: uniquely identify metric time series by name and label sets refactor[Counter + Gauge]: uniquely identify metric time series by name and label sets (1/2) Aug 25, 2025
@incertum incertum marked this pull request as ready for review August 25, 2025 17:58
@incertum
Copy link
Contributor Author

I have also added the code changes for Gauge and moved the PR out of WIP.

@ktoso
Copy link
Collaborator

ktoso commented Aug 26, 2025

This could use some more use of generics rather than copying types around 1:1; here's a patch:
0001-make-metric-groups-use-more-generics.patch

Basically:

-    // Note: In order to support Sendable, need to explicitely define the types.
-    private struct CounterWithHelp {
-        var counter: Counter
+    private struct MetricWithHelp<Metric: AnyObject & Sendable>: Sendable {
+        var metric: Metric
         let help: String
     }

-    private struct GaugeWithHelp {
-        var gauge: Gauge
-        let help: String
-    }
-
-    private struct CounterGroup {
-        // A collection of Counter metrics, each with a unique label set, that share the same metric name.
-        // Distinct help strings for the same metric name are permitted, but Prometheus retains only the
-        // most recent one. For an unlabelled Counter, the empty label set is used as the key, and the
-        // collection contains only one entry. Finally, for clarification, the same Counter metric name can
-        // simultaneously be labeled and unlabeled.
-        var countersByLabelSets: [LabelsKey: CounterWithHelp]
-    }
-
-    private struct GaugeGroup {
-        var gaugesByLabelSets: [LabelsKey: GaugeWithHelp]
+    /// A collection of metrics, each with a unique label set, that share the same metric name.
+    /// Distinct help strings for the same metric name are permitted, but Prometheus retains only the
+    /// most recent one. For an unlabelled metric, the empty label set is used as the key, and the
+    /// collection contains only one entry. Finally, for clarification, the same metric name can
+    /// simultaneously be labeled and unlabeled.
+    private struct MetricGroup<Metric: Sendable & AnyObject>: Sendable {
+        var metricsByLabelSets: [LabelsKey: MetricWithHelp<Metric>]
     }

     private enum Metric {
-        case counter(CounterGroup)
-        case gauge(GaugeGroup)
+        case counter(MetricGroup<Counter>)
+        case gauge(MetricGroup<Gauge>)

Co-authored-by: Konrad `ktoso` Malawski <[email protected]>
Signed-off-by: Melissa Kilby <[email protected]>
/// Distinct help strings for the same metric name are permitted, but Prometheus retains only the
/// most recent one. For an unlabelled metric, the empty label set is used as the key, and the
/// collection contains only one entry. Finally, for clarification, the same metric name can
/// simultaneously be labeled and unlabeled.
Copy link
Collaborator

@ktoso ktoso Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: i'd move the comment onto the MetricGroup type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK I'll do it in the 2/2 PR!

Copy link
Collaborator

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you.

@ktoso ktoso added the semver/none No version bump required. label Aug 26, 2025
@ktoso ktoso merged commit e787564 into swift-server:main Aug 26, 2025
28 checks passed
@ktoso ktoso added 🔨 semver/patch No public API change. and removed semver/none No version bump required. labels Sep 16, 2025
@fabianfett
Copy link
Member

fabianfett commented Sep 17, 2025

Have you tested that this is correctly imported by a prometheus importer? IIRC, prometheus importer doesn't allow flexible labels per metric name. IIRC this means for a given metric name the label names must always be the same.

Update

Found the statement in the prometheus docs:

Client libraries MUST NOT allow users to have different label names for the same metric for Gauge/Counter/Summary/Histogram or any other Collector offered by the library.

https://prometheus.io/docs/instrumenting/writing_clientlibs/#labels

Because of this I think we should revert #131 and #134.

@incertum
Copy link
Contributor Author

Have you tested that this is correctly imported by a prometheus importer? IIRC, prometheus importer doesn't allow flexible labels per metric name. IIRC this means for a given metric name the label names must always be the same.

Based on the new documentation example provided:

# HELP myapp_http_connections_active Currently active HTTP connections
# TYPE myapp_http_connections_active gauge
myapp_http_connections_active 42.0
# HELP myapp_http_requests_total Total HTTP requests
# TYPE myapp_http_requests_total counter
myapp_http_requests_total{method="GET",status="200"} 5.0
myapp_http_requests_total{method="POST",status="201"} 2.0
myapp_http_requests_total{method="GET",status="500",endpoint="/api/users"} 1
# HELP myapp_system_errors_total Total system errors
# TYPE myapp_system_errors_total counter
myapp_system_errors_total 3.0
# HELP myapp_http_request_duration_seconds HTTP request duration in seconds
# TYPE myapp_http_request_duration_seconds histogram
myapp_http_request_duration_seconds_bucket{le="0.01"} 0
myapp_http_request_duration_seconds_bucket{le="0.1"} 0
myapp_http_request_duration_seconds_bucket{le="1.0"} 1
myapp_http_request_duration_seconds_bucket{le="+Inf"} 1
myapp_http_request_duration_seconds_sum 0.15
myapp_http_request_duration_seconds_count 1
# HELP myapp_system_memory_usage_bytes Current memory usage in bytes
# TYPE myapp_system_memory_usage_bytes gauge
myapp_system_memory_usage_bytes 134217728.0
  • Scrapping confirmed to be successful, with no errors.
  • Prometheus aggregation confirmed to be correct. For instance, sum(myapp_http_requests_total) yields the correct result of 8, and sum by (method) (myapp_http_requests_total) also yields the correct aggregation.

Update

Found the statement in the prometheus docs:

Client libraries MUST NOT allow users to have different label names for the same metric for Gauge/Counter/Summary/Histogram or any other Collector offered by the library.

https://prometheus.io/docs/instrumenting/writing_clientlibs/#labels

Because of this I think we should revert #131 and #134.

The changes addressed the following requests from adopters: #125 as follow to a java client discussion prometheus/client_java#696

There may be some room for flexibility to support adopter use cases while avoiding common mistakes (e.g., enforcing that myapp_http_requests_total can’t be used unlabeled simultaneously to avoid incorrect aggregations as the sum should be computed in Prometheus).

For instance, also consider the following issue:
#137
We currently don’t require a HELP text because doing so would break backwards compatibility. In fact, it wasn’t even supported before the 2.20 release. I’ve also noticed that the HELP line is declared as optional in some documentation places while it’s declared as MUST in others.

Perhaps we can get a definitive answer as we approach the Prometheus community. Let’s come back to this once we have a list of required refactors, if applicable. CC @ktoso.

@incertum
Copy link
Contributor Author

Have you tested that this is correctly imported by a prometheus importer? IIRC, prometheus importer doesn't allow flexible labels per metric name. IIRC this means for a given metric name the label names must always be the same.

Based on the new documentation example provided:

# HELP myapp_http_connections_active Currently active HTTP connections
# TYPE myapp_http_connections_active gauge
myapp_http_connections_active 42.0
# HELP myapp_http_requests_total Total HTTP requests
# TYPE myapp_http_requests_total counter
myapp_http_requests_total{method="GET",status="200"} 5.0
myapp_http_requests_total{method="POST",status="201"} 2.0
myapp_http_requests_total{method="GET",status="500",endpoint="/api/users"} 1
# HELP myapp_system_errors_total Total system errors
# TYPE myapp_system_errors_total counter
myapp_system_errors_total 3.0
# HELP myapp_http_request_duration_seconds HTTP request duration in seconds
# TYPE myapp_http_request_duration_seconds histogram
myapp_http_request_duration_seconds_bucket{le="0.01"} 0
myapp_http_request_duration_seconds_bucket{le="0.1"} 0
myapp_http_request_duration_seconds_bucket{le="1.0"} 1
myapp_http_request_duration_seconds_bucket{le="+Inf"} 1
myapp_http_request_duration_seconds_sum 0.15
myapp_http_request_duration_seconds_count 1
# HELP myapp_system_memory_usage_bytes Current memory usage in bytes
# TYPE myapp_system_memory_usage_bytes gauge
myapp_system_memory_usage_bytes 134217728.0
  • Scrapping confirmed to be successful, with no errors.
  • Prometheus aggregation confirmed to be correct. For instance, sum(myapp_http_requests_total) yields the correct result of 8, and sum by (method) (myapp_http_requests_total) also yields the correct aggregation.

Update

Found the statement in the prometheus docs:

Client libraries MUST NOT allow users to have different label names for the same metric for Gauge/Counter/Summary/Histogram or any other Collector offered by the library.

https://prometheus.io/docs/instrumenting/writing_clientlibs/#labels
Because of this I think we should revert #131 and #134.

The changes addressed the following requests from adopters: #125 as follow to a java client discussion prometheus/client_java#696

There may be some room for flexibility to support adopter use cases while avoiding common mistakes (e.g., enforcing that myapp_http_requests_total can’t be used unlabeled simultaneously to avoid incorrect aggregations as the sum should be computed in Prometheus).

For instance, also consider the following issue: #137 We currently don’t require a HELP text because doing so would break backwards compatibility. In fact, it wasn’t even supported before the 2.20 release. I’ve also noticed that the HELP line is declared as optional in some documentation places while it’s declared as MUST in others.

Perhaps we can get a definitive answer as we approach the Prometheus community. Let’s come back to this once we have a list of required refactors, if applicable. CC @ktoso.

Amendment September 23, 2025:

@fabianfett @FranzBusch @ktoso

These changes can remain as per the current guidance and OpenMetrics SPECS.

“Metrics with the same name for a given MetricFamily SHOULD have the same set of label names in their LabelSet.”

See prometheus/docs#2732 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔨 semver/patch No public API change.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash if tags and not the same (allow different tags for the same metric name)

4 participants