Skip to content

Memory regression in opentelemetry prometheus exporter v0.57.0 version with go 1.24Β #6788

@ns-jvillarfernandez

Description

@ns-jvillarfernandez

Description

We've detected a memory leak / regression in one of our containers and the culprit seems to be go.opentelemetry.io/otel/exporters/prometheus v0.57.0 , or a combo change of go.opentelemetry.io/otel/exporters/prometheus v0.57.0 and go version upgrade from 1.23 to 1.24.

  1. Our containers started to OOM after several hours running
  2. we verified that the traffic pattern hadn't change
  3. attaching to a running pod and using go-tool we saw that this was tied to go.opentelemetry.io/otel/exporters/prometheus v0.57.0, precisely math/rand.newSource. Please, see the commands and screenshot below
$ kubectl port-forward pod/XXXXXXX 8086:8085
Forwarding from 127.0.0.1:8086 -> 8085
Forwarding from [::1]:8086 -> 8085
Handling connection for 8086

$ go tool pprof http://localhost:8086/debug/pprof/heap
Fetching profile over HTTP from http://localhost:8086/debug/pprof/heap
Saved profile in /Users/XXXXXXX/pprof/pprof.forward.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
File: XXXXXXX
Build ID: 944b0f39f5443eb2ef822291ecd1bb226a3c768b
Type: inuse_space
Time: 2025-05-16 10:49:15 CEST
Entering interactive mode (type "help" for commands, "o" for options)

(pprof) top
Showing nodes accounting for 286.11MB, 80.53% of 355.27MB total
Dropped 159 nodes (cum <= 1.78MB)
Showing top 10 nodes out of 151
      flat  flat%   sum%        cum   cum%
  123.63MB 34.80% 34.80%   123.63MB 34.80%  math/rand.newSource (inline)
   61.10MB 17.20% 52.00%    61.10MB 17.20%  go.opentelemetry.io/otel/sdk/metric/exemplar.newStorage (inline)
   31.04MB  8.74% 60.73%    31.04MB  8.74%  go.opentelemetry.io/otel/sdk/metric/internal/aggregate.reset[go.shape.struct { FilteredAttributes []go.opentelemetry.io/otel/attribute.KeyValue; Time time.Time; Value go.shape.int64; SpanID []uint8 "json:\",omitempty\""; TraceID []uint8 "json:\",omitempty\"" }]
   25.04MB  7.05% 67.78%    25.04MB  7.05%  go.opentelemetry.io/otel/sdk/metric/internal/aggregate.reset[go.shape.struct { FilteredAttributes []go.opentelemetry.io/otel/attribute.KeyValue; Time time.Time; Value go.shape.float64; SpanID []uint8 "json:\",omitempty\""; TraceID []uint8 "json:\",omitempty\"" }]
   14.56MB  4.10% 71.88%    14.56MB  4.10%  bufio.NewWriterSize
    7.03MB  1.98% 73.86%     7.03MB  1.98%  bufio.NewReaderSize
       7MB  1.97% 75.83%    17.50MB  4.93%  go.opentelemetry.io/otel/exporters/prometheus.addExemplars[go.shape.int64]
    6.19MB  1.74% 77.57%    12.71MB  3.58%  io.copyBuffer
    5.52MB  1.55% 79.12%     5.52MB  1.55%  bytes.growSlice
       5MB  1.41% 80.53%        5MB  1.41%  go.opentelemetry.io/otel/attribute.computeDistinctFixed

Using go tool web

Image

Environment

  • OS: Linux
  • Architecture: x86_64
  • Go Version: 1.254
  • opentelemetry-go version: v0.57.0

Steps To Reproduce

  1. Use go 1.24 and go.opentelemetry.io/otel/exporters/prometheus v0.57.0
  2. Leave the container running with traffic for several hours and scrapping metrics
  3. Memory pattern shows a clear memory increase over time and it get to the limit and gets OOMkilled

Image

Expected behavior

No memory leak

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions