Skip to content

Commit 7daa6e6

Browse files
authored
Replace the OpenMetrics parser by a more efficient one (#44216)
### What does this PR do? Replace the OpenMetrics parser of `github.com/prometheus/common/expfmt` by the one of `github.com/prometheus/prometheus/model/textparse`. ### Motivation While the former is easier to use, the latter is more efficient and profiles on staging and prod are suggesting that the OpenMetrics parsing done in the `kubelet` and `containerd` checks are not negligeable. This PR adds basic benchmarks in `pkg/util/prometheus/parse_test.go`. Here are the results of those benchmark on the `main` commit this PR is based on: ```console $ go test ./pkg/util/prometheus/... -bench=BenchmarkParseMetrics -benchmem -count=10 -benchtime=3s | ~/go/bin/benchstat - goos: linux goarch: amd64 pkg: github.com/DataDog/datadog-agent/pkg/util/prometheus cpu: 12th Gen Intel(R) Core(TM) i9-12900H │ - │ │ sec/op │ ParseMetrics-20 4.645m ± 2% ParseMetricsWithFilter-20 1.844m ± 2% ParseMetricsSmall-20 8.025µ ± 3% geomean 409.6µ │ - │ │ B/op │ ParseMetrics-20 2.741Mi ± 0% ParseMetricsWithFilter-20 1.127Mi ± 0% ParseMetricsSmall-20 8.867Ki ± 0% geomean 306.2Ki │ - │ │ allocs/op │ ParseMetrics-20 82.06k ± 0% ParseMetricsWithFilter-20 30.34k ± 0% ParseMetricsSmall-20 128.0 ± 0% geomean 6.830k ``` Here are the results of the same benchmarks on the latest commit of this PR: ```console $ go test ./pkg/util/prometheus/... -bench=BenchmarkParseMetrics -benchmem -count=10 -benchtime=3s | ~/go/bin/benchstat - goos: linux goarch: amd64 pkg: github.com/DataDog/datadog-agent/pkg/util/prometheus cpu: 12th Gen Intel(R) Core(TM) i9-12900H │ - │ │ sec/op │ ParseMetrics-20 1.453m ± 2% ParseMetricsWithFilter-20 649.6µ ± 1% ParseMetricsSmall-20 2.545µ ± 3% geomean 133.9µ │ - │ │ B/op │ ParseMetrics-20 1.536Mi ± 0% ParseMetricsWithFilter-20 702.7Ki ± 0% ParseMetricsSmall-20 3.445Ki ± 0% geomean 156.2Ki │ - │ │ allocs/op │ ParseMetrics-20 8.032k ± 0% ParseMetricsWithFilter-20 3.621k ± 0% ParseMetricsSmall-20 26.00 ± 0% geomean 911.0 ``` While **the CPU consumption is divided by 3, the number of memory allocations is divided by 10**! This should help removing some stress on the GC. ### Describe how you validated your changes This change has been deployed on a staging cluster: <img width="3619" height="1214" alt="image" src="https://github.com/user-attachments/assets/e5d34900-f226-4edb-9130-59e4a51f880f" /> <!-- <img width="3621" height="1455" alt="image" src="https://github.com/user-attachments/assets/f8e91910-7524-489c-bbeb-87dd8ed91b4f" /> --> ### Additional Notes Co-authored-by: lenaic.huard <[email protected]>
1 parent 78d4f54 commit 7daa6e6

File tree

15 files changed

+899
-251
lines changed

15 files changed

+899
-251
lines changed

comp/otelcol/status/impl/go.mod

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,16 @@ require (
5151
github.com/DataDog/datadog-agent/pkg/version v0.72.2 // indirect
5252
github.com/DataDog/viper v1.14.1-0.20251117172501-5b5dc463bad3 // indirect
5353
github.com/Microsoft/go-winio v0.6.2 // indirect
54+
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.14 // indirect
55+
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.14 // indirect
56+
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.4 // indirect
57+
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.3 // indirect
58+
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.14 // indirect
59+
github.com/aws/aws-sdk-go-v2/service/signin v1.0.1 // indirect
60+
github.com/aws/aws-sdk-go-v2/service/sso v1.30.4 // indirect
61+
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.9 // indirect
62+
github.com/aws/smithy-go v1.23.2 // indirect
63+
github.com/cespare/xxhash/v2 v2.3.0 // indirect
5464
github.com/cihub/seelog v0.0.0-20170130134532-f561c5e57575 // indirect
5565
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
5666
github.com/dustin/go-humanize v1.0.1 // indirect
@@ -60,6 +70,8 @@ require (
6070
github.com/go-ole/go-ole v1.3.0 // indirect
6171
github.com/go-viper/mapstructure/v2 v2.4.0 // indirect
6272
github.com/gofrs/flock v0.13.0 // indirect
73+
github.com/gogo/protobuf v1.3.2 // indirect
74+
github.com/grafana/regexp v0.0.0-20250905093917-f7b3be9d1853 // indirect
6375
github.com/hectane/go-acl v0.0.0-20230122075934-ca0b05cb1adb // indirect
6476
github.com/inconshreveable/mousetrap v1.1.0 // indirect
6577
github.com/lufia/plan9stats v0.0.0-20250317134145-8bc96cf8fc35 // indirect
@@ -69,12 +81,12 @@ require (
6981
github.com/mdlayher/socket v0.5.1 // indirect
7082
github.com/mdlayher/vsock v1.2.1 // indirect
7183
github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect
72-
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
7384
github.com/pelletier/go-toml v1.9.5 // indirect
7485
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
7586
github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect
7687
github.com/prometheus/client_model v0.6.2 // indirect
7788
github.com/prometheus/common v0.67.4 // indirect
89+
github.com/prometheus/prometheus v0.307.3 // indirect
7890
github.com/shirou/gopsutil/v4 v4.25.11 // indirect
7991
github.com/spf13/cast v1.10.0 // indirect
8092
github.com/spf13/cobra v1.10.1 // indirect
@@ -83,6 +95,7 @@ require (
8395
github.com/tklauser/go-sysconf v0.3.16 // indirect
8496
github.com/tklauser/numcpus v0.11.0 // indirect
8597
github.com/yusufpapurcu/wmi v1.2.4 // indirect
98+
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
8699
go.uber.org/atomic v1.11.0 // indirect
87100
go.uber.org/dig v1.19.0 // indirect
88101
go.uber.org/fx v1.24.0 // indirect
@@ -94,8 +107,10 @@ require (
94107
golang.org/x/sys v0.39.0 // indirect
95108
golang.org/x/text v0.32.0 // indirect
96109
golang.org/x/time v0.14.0 // indirect
110+
google.golang.org/genproto/googleapis/rpc v0.0.0-20251022142026-3a174f9686a8 // indirect
97111
google.golang.org/protobuf v1.36.10 // indirect
98112
gopkg.in/yaml.v2 v2.4.0 // indirect
113+
k8s.io/apimachinery v0.35.0-alpha.0 // indirect
99114
)
100115

101116
// This section was automatically added by 'dda inv modules.add-all-replace' command, do not edit manually

comp/otelcol/status/impl/go.sum

Lines changed: 144 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/collector/corechecks/containers/containerd/check.go

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -205,10 +205,6 @@ func (c *ContainerdCheck) scrapeOpenmetricsEndpoint(sender sender.Sender) error
205205

206206
for _, mf := range parsedMetrics {
207207
for _, sample := range mf.Samples {
208-
if sample == nil {
209-
continue
210-
}
211-
212208
metric := sample.Metric
213209

214210
metricName, ok := metric["__name__"]
@@ -217,10 +213,10 @@ func (c *ContainerdCheck) scrapeOpenmetricsEndpoint(sender sender.Sender) error
217213
continue
218214
}
219215

220-
transform, found := defaultContainerdOpenmetricsTransformers[string(metricName)]
216+
transform, found := defaultContainerdOpenmetricsTransformers[metricName]
221217

222218
if found {
223-
transform(sender, string(metricName), *sample)
219+
transform(sender, metricName, sample)
224220
}
225221
}
226222
}

pkg/collector/corechecks/containers/containerd/containerd_transformers.go

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,18 @@
88
package containerd
99

1010
import (
11-
"fmt"
12-
1311
"github.com/DataDog/datadog-agent/pkg/aggregator/sender"
14-
"github.com/prometheus/common/model"
12+
"github.com/DataDog/datadog-agent/pkg/util/prometheus"
1513
)
1614

1715
// metricTransformerFunc is used to tweak or generate new metrics from a given containerd metric
18-
type metricTransformerFunc = func(sender.Sender, string, model.Sample)
16+
type metricTransformerFunc = func(sender.Sender, string, prometheus.Sample)
1917

2018
var defaultContainerdOpenmetricsTransformers = map[string]metricTransformerFunc{
2119
"grpc_server_handled_total": grpcServerHandlerTransformer,
2220
}
2321

24-
func grpcServerHandlerTransformer(s sender.Sender, name string, sample model.Sample) {
22+
func grpcServerHandlerTransformer(s sender.Sender, name string, sample prometheus.Sample) {
2523
metric := sample.Metric
2624

2725
grpcMethod, ok := metric["grpc_method"]
@@ -35,7 +33,7 @@ func grpcServerHandlerTransformer(s sender.Sender, name string, sample model.Sam
3533
}
3634
}
3735

38-
func imagePullMetricTransformer(s sender.Sender, _ string, sample model.Sample) {
36+
func imagePullMetricTransformer(s sender.Sender, _ string, sample prometheus.Sample) {
3937
metric := sample.Metric
4038

4139
grpcCode, ok := metric["grpc_code"]
@@ -45,9 +43,9 @@ func imagePullMetricTransformer(s sender.Sender, _ string, sample model.Sample)
4543
}
4644

4745
metricTags := []string{
48-
fmt.Sprintf("grpc_service:%s", metric["grpc_service"]),
49-
"grpc_code:" + toSnakeCase(string(grpcCode)),
46+
"grpc_service:" + metric["grpc_service"],
47+
"grpc_code:" + toSnakeCase(grpcCode),
5048
}
5149

52-
s.MonotonicCount("containerd.image.pull", float64(sample.Value), "", metricTags)
50+
s.MonotonicCount("containerd.image.pull", sample.Value, "", metricTags)
5351
}

pkg/collector/corechecks/containers/kubelet/common/pod.go

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ import (
1212
"fmt"
1313
"strings"
1414

15-
"github.com/prometheus/common/model"
16-
1715
tagger "github.com/DataDog/datadog-agent/comp/core/tagger/def"
1816
"github.com/DataDog/datadog-agent/comp/core/tagger/tags"
1917
"github.com/DataDog/datadog-agent/comp/core/tagger/types"
@@ -22,6 +20,7 @@ import (
2220
workloadmetafilter "github.com/DataDog/datadog-agent/comp/core/workloadfilter/util/workloadmeta"
2321
workloadmeta "github.com/DataDog/datadog-agent/comp/core/workloadmeta/def"
2422
"github.com/DataDog/datadog-agent/pkg/util/log"
23+
"github.com/DataDog/datadog-agent/pkg/util/prometheus"
2524
)
2625

2726
var (
@@ -154,18 +153,18 @@ func (p *PodUtils) IsHostNetworkedPod(podUID string) bool {
154153
// GetContainerID returns the container ID from the workloadmeta.Component for a given set of metric labels.
155154
// It should only be called on a container-scoped metric. It returns an empty string if the container could not be
156155
// found, or if the container should be filtered out.
157-
func GetContainerID(store workloadmeta.Component, metric model.Metric, containerFilter workloadfilter.FilterBundle) (string, error) {
158-
namespace := string(metric["namespace"])
159-
podUID := string(metric["pod_uid"])
156+
func GetContainerID(store workloadmeta.Component, metric prometheus.Metric, containerFilter workloadfilter.FilterBundle) (string, error) {
157+
namespace := metric["namespace"]
158+
podUID := metric["pod_uid"]
160159
// k8s >= 1.16
161-
containerName := string(metric["container"])
162-
podName := string(metric["pod"])
160+
containerName := metric["container"]
161+
podName := metric["pod"]
163162
// k8s < 1.16
164163
if containerName == "" {
165-
containerName = string(metric["container_name"])
164+
containerName = metric["container_name"]
166165
}
167166
if podName == "" {
168-
podName = string(metric["pod_name"])
167+
podName = metric["pod_name"]
169168
}
170169

171170
pod, err := store.GetKubernetesPod(podUID)

0 commit comments

Comments
 (0)