Skip to content

Docker compose: add resource limits, restart policy, and fix OTLP metrics export #5806

@eldios

Description

@eldios

Problem

Docker compose validator deployments have no CPU/memory limits, no restart policy, and a broken OTLP metrics export configuration:

  1. No resource limits — containers consume all available resources. On the OVH test validator (8-core/30GB), ScyllaDB alone used 195% CPU and 14GB RAM, causing load averages of 14+ and degraded performance for all services.

  2. No restart policy — when shards OOM or crash, they stay down until manually restarted. This causes the validator to become partially unavailable.

  3. OTLP metrics export broken — the Alloy config has explicit Content-Type: application/x-protobuf header and compression = "gzip" in the Prometheus OTLP exporter, causing HTTP 400 errors from the remote endpoint. The otelcol.exporter.otlphttp sets these automatically; the explicit values conflict.

Solution

  • Add configurable CPU/memory limits via LIMIT_CPUS_* / LIMIT_MEM_* env vars with sane defaults for an 8-core/32GB machine
  • Add restart: unless-stopped to all long-running services
  • Remove explicit Content-Type header and gzip compression from alloy-config.river
  • Increase shard memory default to 2560M (1536M caused OOM kills under normal cross-chain message load)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions