Skip to content

fix: prevent slice backing array corruption in docker metrics#151

Merged
skylenet merged 1 commit intomainfrom
fix-panic-disk
Jan 14, 2026
Merged

fix: prevent slice backing array corruption in docker metrics#151
skylenet merged 1 commit intomainfrom
fix-panic-disk

Conversation

@skylenet
Copy link
Member

@skylenet skylenet commented Jan 14, 2026

Summary

  • Fix panic label name "device_major" missing in label map in docker metrics collection
  • Root cause: Go slice append bug where multiple append() calls on shared labelNames slice corrupted label definitions due to backing array sharing
  • Create dedicated label slices for network and block I/O metrics using make() with explicit capacity

Details

The bug was latent but became visible after the Go 1.25 upgrade. Go's slice growth algorithm changed, causing labelNames to retain extra capacity after initial building. Subsequent append(labelNames, "interface") and append(labelNames, "device_major", "device_minor") calls then shared the same backing array, with block I/O labels overwriting network labels.

The fix follows the same safe pattern already used for volume metrics in the same file.

Logs

2026-01-14T10:21:52.270991368Z panic: label name \"device_major\" missing in label map\n\x02\x00\x00\x00\x00\x00\x00 2026-01-14T10:21:52.271006606Z \n\x02\x00\x00\x00\x00\x00\x0082026-01-14T10:21:52.271009973Z goroutine 108 [running]:\n\x02\x00\x00\x00\x00\x00\x00y2026-01-14T10:21:52.271012578Z github.com/prometheus/client_golang/prometheus.(*GaugeVec).With(0xf200a0?, 0xc000481080?)\n\x02\x00\x00\x00\x00\x00\x00\x822026-01-14T10:21:52.271015263Z \t/home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.23.2/prometheus/gauge.go:250 +0x51\n\x02\x00\x00\x00\x00\x00\x00\xcd2026-01-14T10:21:52.271018008Z github.com/ethpandaops/ethereum-metrics-exporter/pkg/exporter/docker.(*metrics).updateContainerMetrics(0xc000146b40, {0x135ec38?, 0xc0002be374?}, 0xc00020e508, 0xc000480a50)\n\x02\x00\x00\x00\x00\x00\x00\x902026-01-14T10:21:52.271020743Z \t/home/runner/work/ethereum-metrics-exporter/ethereum-metrics-exporter/pkg/exporter/docker/metrics.go:514 +0x949\n\x02\x00\x00\x00\x00\x00\x01\x022026-01-14T10:21:52.271023368Z github.com/ethpandaops/ethereum-metrics-exporter/pkg/exporter/docker.(*containerMetrics).collectContainerMetrics(0xc000392380, {0x135ec38, 0x1aac400}, {{0xc000222c80, 0x40}, {0xc0002be374, 0x9}, {0xc0000df3f0, 0x3, 0x3}, ...})\n\x02\x00\x00\x00\x00\x00\x00\x8f2026-01-14T10:21:52.271026163Z \t/home/runner/work/ethereum-metrics-exporter/ethereum-metrics-exporter/pkg/exporter/docker/docker.go:138 +0x165\n\x02\x00\x00\x00\x00\x00\x00\xad2026-01-14T10:21:52.271028748Z github.com/ethpandaops/ethereum-metrics-exporter/pkg/exporter/docker.(*containerMetrics).collectMetrics(0xc000392380, {0x135ec38, 0x1aac400})\n\x02\x00\x00\x00\x00\x00\x00\x8d2026-01-14T10:21:52.271031343Z \t/home/runner/work/ethereum-metrics-exporter/ethereum-metrics-exporter/pkg/exporter/docker/docker.go:97 +0xf9\n\x02\x00\x00\x00\x00\x00\x00\xa92026-01-14T10:21:52.271033888Z github.com/ethpandaops/ethereum-metrics-exporter/pkg/exporter/docker.(*containerMetrics).StartAsync(0xc000392380, {0x135ec38, 0x1aac400})\n\x02\x00\x00\x00\x00\x00\x00\x8d2026-01-14T10:21:52.271036512Z \t/home/runner/work/ethereum-metrics-exporter/ethereum-metrics-exporter/pkg/exporter/docker/docker.go:77 +0x25\n\x02\x00\x00\x00\x00\x00\x00\x8a2026-01-14T10:21:52.271039047Z created by github.com/ethpandaops/ethereum-metrics-exporter/pkg/exporter.(*exporter).Serve in goroutine 10\n\x02\x00\x00\x00\x00\x00\x00\x8a2026-01-14T10:21:52.271051611Z \t/home/runner/work/ethereum-metrics-exporter/ethereum-metrics-exporter/pkg/exporter/exporter.go:190 +0x67b

Test plan

  • Build passes (go build ./...)
  • Tests pass (go test ./...)
  • Manual verification with Docker metrics collection

Multiple append() calls on the shared labelNames slice could share the
same underlying backing array, causing label names to be overwritten.
This resulted in a panic: "label name 'device_major' missing in label map"
when Prometheus tried to use network metrics with corrupted labels.

Create dedicated label slices for network and block I/O metrics using
make() with explicit capacity, following the pattern already used for
volume metrics in the same file.
@skylenet skylenet merged commit fabd1db into main Jan 14, 2026
2 checks passed
@skylenet skylenet deleted the fix-panic-disk branch January 14, 2026 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants