Skip to content

Commit a86fea6

Browse files
eldiosclaude
andauthored
Add Grafana Alloy for centralized observability (metrics, logs, traces) (#4816)
## Summary This PR adds optional Grafana Alloy support to enable centralized observability for validators by collecting and forwarding metrics, logs, and traces to a remote monitoring infrastructure. ## Changes ### Docker Compose - **Optional Alloy monitoring** via `docker-compose.alloy.yml` override file - Opt-in: `docker-compose -f docker-compose.yml -f docker-compose.alloy.yml up -d` - Default behavior unchanged: validators run without monitoring - **Alloy configuration** (`docker/alloy-config.river`) - **Metrics**: Scrapes Prometheus metrics from proxy and shard services (port 21100) - **Logs**: Discovers and streams Docker container logs - **Traces**: OTLP receiver on ports 4317 (gRPC) and 4318 (HTTP) - **Remote forwarding**: Optionally forwards to central Prometheus, Loki, and Tempo ### Kubernetes Helm Chart - **Alloy dependency** in `Chart.yaml` (version 1.3.1) - **Alloy config template** (`alloy-config.river.tpl`) - Kubernetes pod/service discovery - Scrapes metrics from linera-proxy and linera-shard pods - Collects pod logs via Kubernetes API - Deployed as DaemonSet for distributed collection - **Configuration** in `values-local.yaml.gotmpl` - Controlled by `LINERA_HELMFILE_SET_ALLOY_ENABLED` env var (default: false) - Environment-based credentials - Cluster/validator labels for multi-cluster visibility - Resource limits (CPU: 500m, Memory: 512Mi) - **Updated README** with monitoring reference ## Configuration ### Docker Compose **Basic (local metrics only):** ```bash docker-compose -f docker-compose.yml -f docker-compose.alloy.yml up -d ``` **With remote endpoints:** ```bash # Prometheus (OTLP format) export PROMETHEUS_OTLP_URL="https://your-prometheus/otlp" export PROMETHEUS_OTLP_USER="username" export PROMETHEUS_OTLP_PASS="password" # Loki (logs) export LOKI_PUSH_URL="https://your-loki/loki/api/v1/push" export LOKI_PUSH_USER="username" export LOKI_PUSH_PASS="password" # Tempo (traces) export TEMPO_OTLP_URL="https://your-tempo/otlp" export TEMPO_OTLP_USER="username" export TEMPO_OTLP_PASS="password" # Validator identification export HOSTNAME="validator-01" docker-compose -f docker-compose.yml -f docker-compose.alloy.yml up -d ``` ### Kubernetes ```bash # Enable Alloy export LINERA_HELMFILE_SET_ALLOY_ENABLED=true # Identification export LINERA_HELMFILE_SET_CLUSTER_NAME="production-gke" export LINERA_HELMFILE_SET_VALIDATOR_NAME="validator-01" # Optional: Remote endpoints (same format as Docker) export PROMETHEUS_OTLP_URL="https://..." export PROMETHEUS_OTLP_USER="username" export PROMETHEUS_OTLP_PASS="password" # ... (Loki and Tempo) # Deploy lineractl deploy ... ``` ## Key Features - ✅ **Fully optional**: No changes to default behavior, validators work without monitoring - ✅ **Centralized observability**: Send data to a single monitoring stack - ✅ **Standards-based**: OpenTelemetry Protocol (OTLP) and Prometheus remote write - ✅ **Secure**: TLS with certificate verification, basic auth support - ✅ **Auto-discovery**: Docker containers and Kubernetes pods automatically detected - ✅ **Configurable**: All endpoints and credentials via environment variables - ✅ **Distributed collection**: Kubernetes DaemonSet scales across nodes - ✅ **Multi-validator visibility**: Monitor entire validator fleet ## Architecture **Docker Compose**: Alloy container scrapes metrics from services, collects container logs via Docker socket **Kubernetes**: Alloy DaemonSet (one pod per node) discovers and collects from validator pods using Kubernetes API ## Verification ### Docker Compose ```bash # Check Alloy status docker-compose -f docker-compose.yml -f docker-compose.alloy.yml ps alloy # View logs docker-compose logs alloy # Test metrics endpoint curl http://localhost:12345/metrics ``` ### Kubernetes ```bash # Check DaemonSet kubectl get daemonset alloy -n <namespace> # Check pods kubectl get pods -l app.kubernetes.io/name=alloy -n <namespace> # View logs kubectl logs -l app.kubernetes.io/name=alloy -n <namespace> --tail=100 ``` ## Files Changed - `docker/docker-compose.alloy.yml` - Optional Alloy override - `docker/alloy-config.river` - Alloy configuration for Docker - `kubernetes/linera-validator/Chart.yaml` - Alloy dependency - `kubernetes/linera-validator/Chart.lock` - Updated lock file - `kubernetes/linera-validator/charts/alloy-1.3.1.tgz` - Alloy Helm chart - `kubernetes/linera-validator/alloy-config.river.tpl` - Alloy config template - `kubernetes/linera-validator/values-local.yaml.gotmpl` - Alloy configuration - `kubernetes/linera-validator/README.md` - Added monitoring reference ## Benefits - Internal validators can send telemetry to Linera's central monitoring platform - External partners can optionally enable monitoring to share telemetry - No impact on validators that don't need/want monitoring - Reduces local resource usage when using central storage - Enables fleet-wide monitoring and alerting ## Future Work - Add OpenTelemetry instrumentation to linera-proxy and linera-shard for distributed tracing - Configure alerting rules for critical validator events - Create custom dashboards for validator-specific metrics 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent 1dfc853 commit a86fea6

File tree

7 files changed

+481
-6
lines changed

7 files changed

+481
-6
lines changed

docker/alloy-config.river

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
// Grafana Alloy configuration for Linera validator observability
2+
// Collects metrics, logs, and traces and forwards to central stack
3+
4+
// ==================== Prometheus Metrics Scraping ====================
5+
6+
// Scrape metrics from proxy service
7+
prometheus.scrape "proxy_metrics" {
8+
targets = [{
9+
__address__ = "proxy:21100",
10+
job = "linera-proxy",
11+
instance = env("HOSTNAME"),
12+
}]
13+
14+
// Forward to OTLP converter for remote push if configured
15+
forward_to = [otelcol.receiver.prometheus.default.receiver]
16+
17+
scrape_interval = "15s"
18+
scrape_timeout = "10s"
19+
}
20+
21+
// Scrape metrics from shard services (all 4 replicas)
22+
prometheus.scrape "shard_metrics" {
23+
targets = [
24+
{
25+
__address__ = "shard:21100",
26+
job = "linera-shard",
27+
instance = env("HOSTNAME"),
28+
},
29+
]
30+
31+
// Forward to OTLP converter for remote push if configured
32+
forward_to = [otelcol.receiver.prometheus.default.receiver]
33+
34+
scrape_interval = "15s"
35+
scrape_timeout = "10s"
36+
}
37+
38+
// Expose Alloy's own metrics
39+
prometheus.exporter.self "alloy" {}
40+
41+
prometheus.scrape "alloy_metrics" {
42+
targets = prometheus.exporter.self.alloy.targets
43+
// Forward to OTLP converter for remote push if configured
44+
forward_to = [otelcol.receiver.prometheus.default.receiver]
45+
}
46+
47+
// ==================== Prometheus Metrics Export (Optional) ====================
48+
49+
// Convert Prometheus metrics to OTLP and send to central (Prometheus 3.x uses OTLP)
50+
// To enable, set these environment variables:
51+
// PROMETHEUS_OTLP_URL: https://your-prometheus-endpoint/otlp
52+
// PROMETHEUS_OTLP_USER: your-username
53+
// PROMETHEUS_OTLP_PASS: your-password
54+
55+
// Export Prometheus metrics as OTLP
56+
otelcol.exporter.otlphttp "prometheus" {
57+
client {
58+
endpoint = env("PROMETHEUS_OTLP_URL")
59+
60+
auth = otelcol.auth.basic.prometheus_credentials.handler
61+
62+
tls {
63+
insecure_skip_verify = false
64+
}
65+
}
66+
}
67+
68+
// Basic auth for Prometheus OTLP
69+
otelcol.auth.basic "prometheus_credentials" {
70+
username = env("PROMETHEUS_OTLP_USER")
71+
password = env("PROMETHEUS_OTLP_PASS")
72+
}
73+
74+
// Convert Prometheus metrics to OTLP format
75+
otelcol.receiver.prometheus "default" {
76+
output {
77+
metrics = [otelcol.exporter.otlphttp.prometheus.input]
78+
}
79+
}
80+
81+
// ==================== Loki Logs Collection ====================
82+
83+
// Discover docker containers
84+
discovery.docker "containers" {
85+
host = "unix:///var/run/docker.sock"
86+
}
87+
88+
// Relabel discovered containers
89+
discovery.relabel "docker_logs" {
90+
targets = discovery.docker.containers.targets
91+
92+
rule {
93+
source_labels = ["__meta_docker_container_name"]
94+
target_label = "container"
95+
}
96+
97+
rule {
98+
source_labels = ["__meta_docker_container_label_com_docker_compose_service"]
99+
target_label = "service"
100+
}
101+
102+
rule {
103+
source_labels = ["__meta_docker_container_label_com_docker_compose_project"]
104+
target_label = "project"
105+
}
106+
}
107+
108+
// Read docker logs
109+
loki.source.docker "containers" {
110+
host = "unix:///var/run/docker.sock"
111+
targets = discovery.relabel.docker_logs.output
112+
forward_to = [loki.write.central.receiver]
113+
}
114+
115+
// Write logs to central Loki (optional - only if env vars are set)
116+
// To enable, set these environment variables:
117+
// LOKI_PUSH_URL: https://your-loki-endpoint/loki/api/v1/push
118+
// LOKI_PUSH_USER: your-username
119+
// LOKI_PUSH_PASS: your-password
120+
loki.write "central" {
121+
endpoint {
122+
url = env("LOKI_PUSH_URL")
123+
124+
basic_auth {
125+
username = env("LOKI_PUSH_USER")
126+
password = env("LOKI_PUSH_PASS")
127+
}
128+
129+
tls_config {
130+
insecure_skip_verify = false
131+
}
132+
}
133+
134+
external_labels = {
135+
cluster = "validator-docker-compose",
136+
validator = env("HOSTNAME"),
137+
}
138+
}
139+
140+
// ==================== Tempo Traces Collection ====================
141+
142+
// OTLP receiver for traces
143+
otelcol.receiver.otlp "default" {
144+
grpc {
145+
endpoint = "0.0.0.0:4317"
146+
}
147+
148+
http {
149+
endpoint = "0.0.0.0:4318"
150+
}
151+
152+
output {
153+
traces = [otelcol.exporter.otlphttp.central.input]
154+
}
155+
}
156+
157+
// Export traces to central Tempo (optional - only if env vars are set)
158+
// To enable, set these environment variables:
159+
// TEMPO_OTLP_URL: https://your-tempo-endpoint/tempo/otlp
160+
// TEMPO_OTLP_USER: your-username
161+
// TEMPO_OTLP_PASS: your-password
162+
otelcol.exporter.otlphttp "central" {
163+
client {
164+
endpoint = env("TEMPO_OTLP_URL")
165+
166+
auth = otelcol.auth.basic.credentials.handler
167+
168+
tls {
169+
insecure_skip_verify = false
170+
}
171+
}
172+
}
173+
174+
// Basic auth for OTLP
175+
otelcol.auth.basic "credentials" {
176+
username = env("TEMPO_OTLP_USER")
177+
password = env("TEMPO_OTLP_PASS")
178+
}
179+
180+
// ==================== Metrics Exposition ====================
181+
182+
// Expose Prometheus-compatible metrics endpoint for central Prometheus to scrape
183+
// This runs on port 12345 and exposes all collected metrics
184+
// Note: Alloy's own metrics are already exposed via prometheus.exporter.self

docker/docker-compose.alloy.yml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Docker Compose override file to enable Grafana Alloy for central observability
2+
#
3+
# Usage:
4+
# docker-compose -f docker-compose.yml -f docker-compose.alloy.yml up -d
5+
#
6+
# Documentation: See MONITORING.md for complete setup and configuration guide
7+
#
8+
# Required environment variables for remote push:
9+
# PROMETHEUS_OTLP_URL: https://your-prometheus-endpoint/otlp
10+
# PROMETHEUS_OTLP_USER: your-username
11+
# PROMETHEUS_OTLP_PASS: your-password
12+
# LOKI_PUSH_URL: https://your-loki-endpoint/loki/api/v1/push
13+
# LOKI_PUSH_USER: your-username
14+
# LOKI_PUSH_PASS: your-password
15+
# TEMPO_OTLP_URL: https://your-tempo-endpoint/tempo/otlp
16+
# TEMPO_OTLP_USER: your-username
17+
# TEMPO_OTLP_PASS: your-password
18+
19+
services:
20+
alloy:
21+
image: grafana/alloy:latest
22+
container_name: alloy
23+
ports:
24+
- "12345:12345" # Prometheus metrics exposition
25+
- "4317:4317" # OTLP gRPC receiver
26+
- "4318:4318" # OTLP HTTP receiver
27+
volumes:
28+
- ./alloy-config.river:/etc/alloy/config.river:ro
29+
- /var/run/docker.sock:/var/run/docker.sock:ro
30+
command:
31+
- "run"
32+
- "--server.http.listen-addr=0.0.0.0:12345"
33+
- "/etc/alloy/config.river"
34+
environment:
35+
- HOSTNAME=${HOSTNAME:-validator}
36+
- PROMETHEUS_OTLP_URL=${PROMETHEUS_OTLP_URL:-}
37+
- PROMETHEUS_OTLP_USER=${PROMETHEUS_OTLP_USER:-}
38+
- PROMETHEUS_OTLP_PASS=${PROMETHEUS_OTLP_PASS:-}
39+
- LOKI_PUSH_URL=${LOKI_PUSH_URL:-}
40+
- LOKI_PUSH_USER=${LOKI_PUSH_USER:-}
41+
- LOKI_PUSH_PASS=${LOKI_PUSH_PASS:-}
42+
- TEMPO_OTLP_URL=${TEMPO_OTLP_URL:-}
43+
- TEMPO_OTLP_USER=${TEMPO_OTLP_USER:-}
44+
- TEMPO_OTLP_PASS=${TEMPO_OTLP_PASS:-}
45+
labels:
46+
com.centurylinklabs.watchtower.enable: "true"
47+
depends_on:
48+
- proxy
49+
- shard

kubernetes/linera-validator/Chart.lock

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,8 @@ dependencies:
88
- name: pyroscope
99
repository: https://grafana.github.io/helm-charts
1010
version: 1.14.2
11-
digest: sha256:7fe611b57ddb6d72aa31bac87568fdb8e531e988e2ce4067b931d3026332f027
12-
generated: "2025-09-01T16:56:59.19795-03:00"
11+
- name: alloy
12+
repository: https://grafana.github.io/helm-charts
13+
version: 1.3.1
14+
digest: sha256:295a8fc7b332a0b3c3223c2192ee1dbff016f8707760c5b4b22d76403d6d7af4
15+
generated: "2025-10-21T02:26:24.01435788+02:00"

kubernetes/linera-validator/Chart.yaml

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,24 +15,29 @@ type: application
1515
# This is the chart version. This version number should be incremented each time you make changes
1616
# to the chart and its templates, including the app version.
1717
# Versions are expected to follow Semantic Versioning (https://semver.org/)
18-
version: 0.1.0
18+
version: 0.2.0
1919

2020
# This is the version number of the application being deployed. This version number should be
2121
# incremented each time you make changes to the application. Versions are not expected to
2222
# follow Semantic Versioning. They should reflect the version the application is using.
2323
# It is recommended to use it with quotes.
24-
appVersion: "1.16.0"
24+
appVersion: "1.16.1"
2525

2626
# Dependencies of the application being deployed.
2727
dependencies:
2828
- name: kube-prometheus-stack
2929
version: "51.0.3"
3030
repository: "https://prometheus-community.github.io/helm-charts"
31-
31+
condition: kube-prometheus-stack.enabled
3232
- name: loki-stack
3333
version: "2.8.9"
3434
repository: "https://grafana.github.io/helm-charts"
35-
35+
condition: loki-stack.enabled
3636
- name: pyroscope
3737
version: "1.14.2"
3838
repository: "https://grafana.github.io/helm-charts"
39+
condition: pyroscope.enabled
40+
- name: alloy
41+
version: "1.3.1"
42+
repository: "https://grafana.github.io/helm-charts"
43+
condition: alloy.enabled

0 commit comments

Comments
 (0)