-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed as not planned
Closed as not planned
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Description
We've detected a memory leak / regression in one of our containers and the culprit seems to be go.opentelemetry.io/otel/exporters/prometheus v0.57.0 , or a combo change of go.opentelemetry.io/otel/exporters/prometheus v0.57.0 and go version upgrade from 1.23 to 1.24.
- Our containers started to OOM after several hours running
- we verified that the traffic pattern hadn't change
- attaching to a running pod and using go-tool we saw that this was tied to
go.opentelemetry.io/otel/exporters/prometheus v0.57.0, preciselymath/rand.newSource. Please, see the commands and screenshot below
$ kubectl port-forward pod/XXXXXXX 8086:8085
Forwarding from 127.0.0.1:8086 -> 8085
Forwarding from [::1]:8086 -> 8085
Handling connection for 8086
$ go tool pprof http://localhost:8086/debug/pprof/heap
Fetching profile over HTTP from http://localhost:8086/debug/pprof/heap
Saved profile in /Users/XXXXXXX/pprof/pprof.forward.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
File: XXXXXXX
Build ID: 944b0f39f5443eb2ef822291ecd1bb226a3c768b
Type: inuse_space
Time: 2025-05-16 10:49:15 CEST
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 286.11MB, 80.53% of 355.27MB total
Dropped 159 nodes (cum <= 1.78MB)
Showing top 10 nodes out of 151
flat flat% sum% cum cum%
123.63MB 34.80% 34.80% 123.63MB 34.80% math/rand.newSource (inline)
61.10MB 17.20% 52.00% 61.10MB 17.20% go.opentelemetry.io/otel/sdk/metric/exemplar.newStorage (inline)
31.04MB 8.74% 60.73% 31.04MB 8.74% go.opentelemetry.io/otel/sdk/metric/internal/aggregate.reset[go.shape.struct { FilteredAttributes []go.opentelemetry.io/otel/attribute.KeyValue; Time time.Time; Value go.shape.int64; SpanID []uint8 "json:\",omitempty\""; TraceID []uint8 "json:\",omitempty\"" }]
25.04MB 7.05% 67.78% 25.04MB 7.05% go.opentelemetry.io/otel/sdk/metric/internal/aggregate.reset[go.shape.struct { FilteredAttributes []go.opentelemetry.io/otel/attribute.KeyValue; Time time.Time; Value go.shape.float64; SpanID []uint8 "json:\",omitempty\""; TraceID []uint8 "json:\",omitempty\"" }]
14.56MB 4.10% 71.88% 14.56MB 4.10% bufio.NewWriterSize
7.03MB 1.98% 73.86% 7.03MB 1.98% bufio.NewReaderSize
7MB 1.97% 75.83% 17.50MB 4.93% go.opentelemetry.io/otel/exporters/prometheus.addExemplars[go.shape.int64]
6.19MB 1.74% 77.57% 12.71MB 3.58% io.copyBuffer
5.52MB 1.55% 79.12% 5.52MB 1.55% bytes.growSlice
5MB 1.41% 80.53% 5MB 1.41% go.opentelemetry.io/otel/attribute.computeDistinctFixed
Using go tool web
Environment
- OS: Linux
- Architecture: x86_64
- Go Version: 1.254
- opentelemetry-go version: v0.57.0
Steps To Reproduce
- Use
go 1.24andgo.opentelemetry.io/otel/exporters/prometheus v0.57.0 - Leave the container running with traffic for several hours and scrapping metrics
- Memory pattern shows a clear memory increase over time and it get to the limit and gets
OOMkilled
Expected behavior
No memory leak
ns-jvillarfernandez and ns-hhulikere
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working

