This guide covers all configuration options for the Rancher Centralized Monitoring Relay.
| Variable | Required | Default | Description |
|---|---|---|---|
RANCHER_API_ENDPOINT |
✅ | - | Rancher server API endpoint URL |
RANCHER_API_ACCESS_KEY |
✅ | - | Rancher API access key (token-xxxxx) |
RANCHER_API_SECRET_KEY |
✅ | - | Rancher API secret key |
CLUSTER_ID |
✅ | - | Target remote cluster ID (c-xxxxxxx) |
CLUSTER_NAME |
❌ | "" | Human-readable cluster name for logging |
DEBUG |
❌ | false | Enable debug logging |
METRICS_PORT |
❌ | 9000 | HTTP server port for metrics/health endpoints |
| Variable | Required | Default | Description |
|---|---|---|---|
PROMETHEUS_NAMESPACE |
❌ | cattle-monitoring-system | Namespace containing Prometheus service |
PROMETHEUS_SERVICE |
❌ | rancher-monitoring-prometheus | Prometheus service name |
PROMETHEUS_PORT |
❌ | 9090 | Prometheus service port |
| Variable | Required | Default | Description |
|---|---|---|---|
LOKI_NAMESPACE |
❌ | cattle-logging-system | Namespace containing Loki service |
LOKI_SERVICE |
❌ | rancher-logging-loki | Loki service name |
LOKI_PORT |
❌ | 3100 | Loki service port |
| Variable | Required | Default | Description |
|---|---|---|---|
REMOTE_NAMESPACE |
❌ | "" | Namespace for custom service |
REMOTE_SERVICE |
❌ | "" | Custom service name |
REMOTE_PORT |
❌ | "" | Custom service port |
For a standard Rancher setup with default monitoring stack:
export RANCHER_API_ENDPOINT="https://rancher.example.com"
export RANCHER_API_ACCESS_KEY="token-abc123"
export RANCHER_API_SECRET_KEY="your-secret-key"
export CLUSTER_ID="c-m-xyz789"
export CLUSTER_NAME="production-cluster"For clusters with custom Prometheus/Loki deployments:
# Core configuration
export RANCHER_API_ENDPOINT="https://rancher.example.com"
export RANCHER_API_ACCESS_KEY="token-abc123"
export RANCHER_API_SECRET_KEY="your-secret-key"
export CLUSTER_ID="c-m-xyz789"
# Custom Prometheus
export PROMETHEUS_NAMESPACE="monitoring"
export PROMETHEUS_SERVICE="prometheus-server"
export PROMETHEUS_PORT="9090"
# Custom Loki
export LOKI_NAMESPACE="logging"
export LOKI_SERVICE="loki"
export LOKI_PORT="3100"To relay a custom service (not Prometheus/Loki):
# Core configuration
export RANCHER_API_ENDPOINT="https://rancher.example.com"
export RANCHER_API_ACCESS_KEY="token-abc123"
export RANCHER_API_SECRET_KEY="your-secret-key"
export CLUSTER_ID="c-m-xyz789"
# Custom service
export REMOTE_NAMESPACE="monitoring"
export REMOTE_SERVICE="alertmanager"
export REMOTE_PORT="9093"For troubleshooting and development:
# Enable debug logging
export DEBUG=true
# Custom metrics port
export METRICS_PORT=8080
# All other config...
export RANCHER_API_ENDPOINT="https://rancher.example.com"
# ... etc# Rancher connection settings
rancher:
apiEndpoint: "https://rancher.example.com"
clusterId: "c-m-abc123xyz"
clusterName: "production-cluster-1"
auth:
# Option 1: Use existing Kubernetes secret (recommended)
existingSecret: "rancher-api-credentials"
accessKeySecretKey: "access-key"
secretKeySecretKey: "secret-key"
# Option 2: Inline credentials (not recommended for production)
accessKey: "token-abc123"
secretKey: "your-secret-key"
# Service monitoring configuration
monitoring:
prometheus:
namespace: "cattle-monitoring-system"
service: "rancher-monitoring-prometheus"
port: "9090"
loki:
namespace: "cattle-logging-system"
service: "rancher-logging-loki"
port: "3100"
remote:
namespace: ""
service: ""
port: ""
# Application settings
app:
debug: false
metricsPort: 9000
# Kubernetes deployment settings
image:
repository: supporttools/rancher-monitoring-relay
tag: "latest"
pullPolicy: IfNotPresent
replicaCount: 1
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
# Service account
serviceAccount:
create: true
annotations: {}
name: ""
# Pod security
podSecurityContext:
fsGroup: 1001
runAsGroup: 1001
runAsNonRoot: true
runAsUser: 1001
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1001
# Service
service:
type: ClusterIP
port: 9000
targetPort: 9000
# Health checks
healthCheck:
enabled: true
path: /health
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
# Monitoring integration
monitoring:
serviceMonitor:
enabled: false
interval: 30s
scrapeTimeout: 10s
labels: {}
annotations: {}
# Autoscaling (if needed)
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 80
# Node selection
nodeSelector: {}
tolerations: []
affinity: {}
# Ingress (if external access needed)
ingress:
enabled: false
className: ""
annotations: {}
hosts:
- host: monitoring-relay.example.com
paths:
- path: /
pathType: Prefix
tls: []# production-values.yaml
rancher:
apiEndpoint: "https://rancher.company.com"
clusterId: "c-m-prod1234"
clusterName: "production-cluster-east"
auth:
existingSecret: "rancher-prod-credentials"
monitoring:
prometheus:
namespace: "cattle-monitoring-system"
service: "rancher-monitoring-prometheus"
port: "9090"
loki:
namespace: "cattle-logging-system"
service: "rancher-logging-loki"
port: "3100"
app:
debug: false
metricsPort: 9000
image:
tag: "0.3.5"
pullPolicy: Always
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# Enable Prometheus monitoring
monitoring:
serviceMonitor:
enabled: true
interval: 30s
labels:
monitoring: "prometheus"
# Pod anti-affinity for HA
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: rancher-monitoring-relay
topologyKey: kubernetes.io/hostname
# Multiple replicas for HA
replicaCount: 2
# Enable autoscaling
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 70The relay constructs service proxy URLs following this pattern:
{RANCHER_API_ENDPOINT}/k8s/clusters/{CLUSTER_ID}/api/v1/namespaces/{NAMESPACE}/services/{SERVICE}:{PORT}/proxy/
For a cluster c-m-abc123 with Rancher at https://rancher.example.com:
Prometheus:
https://rancher.example.com/k8s/clusters/c-m-abc123/api/v1/namespaces/cattle-monitoring-system/services/rancher-monitoring-prometheus:9090/proxy/
Loki:
https://rancher.example.com/k8s/clusters/c-m-abc123/api/v1/namespaces/cattle-logging-system/services/rancher-logging-loki:3100/proxy/
Custom Service:
https://rancher.example.com/k8s/clusters/c-m-abc123/api/v1/namespaces/monitoring/services/alertmanager:9093/proxy/
For monitoring multiple clusters, deploy separate instances with unique configurations:
# cluster1-values.yaml
rancher:
clusterId: "c-m-cluster1"
clusterName: "production-east"
fullnameOverride: "cluster1-monitoring-relay"
---
# cluster2-values.yaml
rancher:
clusterId: "c-m-cluster2"
clusterName: "production-west"
fullnameOverride: "cluster2-monitoring-relay"Deploy each:
helm install cluster1 supporttools/rancher-monitoring-relay -f cluster1-values.yaml
helm install cluster2 supporttools/rancher-monitoring-relay -f cluster2-values.yamlThe relay provides health endpoints that can be customized:
| Endpoint | Purpose | HTTP Method |
|---|---|---|
/health |
Basic Rancher API connectivity | GET |
/ready |
Service connectivity via proxy | GET |
/version |
Build and version information | GET |
/metrics |
Prometheus metrics | GET |
Enable automatic Prometheus scraping:
monitoring:
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: 10s
labels:
monitoring: "prometheus"
team: "platform"
annotations:
prometheus.io/scrape: "true"Adjust based on your cluster size and monitoring load:
resources:
requests:
cpu: 100m # Minimum CPU required
memory: 128Mi # Minimum memory required
limits:
cpu: 500m # Maximum CPU allowed
memory: 512Mi # Maximum memory allowedThe relay runs with minimal privileges:
securityContext:
allowPrivilegeEscalation: false # Prevent privilege escalation
capabilities:
drop:
- ALL # Drop all capabilities
readOnlyRootFilesystem: true # Read-only root filesystem
runAsNonRoot: true # Run as non-root user
runAsUser: 1001 # Specific user IDapp:
debug: true
metricsPort: 9000
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
replicaCount: 1
monitoring:
serviceMonitor:
enabled: falseapp:
debug: false
metricsPort: 9000
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 300m
memory: 384Mi
replicaCount: 1
monitoring:
serviceMonitor:
enabled: true
interval: 60sapp:
debug: false
metricsPort: 9000
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
replicaCount: 2
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 5
monitoring:
serviceMonitor:
enabled: true
interval: 30s
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: rancher-monitoring-relay
topologyKey: kubernetes.io/hostnameAfter configuration, validate your setup:
# Check configuration values
helm get values my-monitoring-relay
# Test endpoints
kubectl port-forward svc/my-monitoring-relay 9000:9000
curl http://localhost:9000/health
curl http://localhost:9000/ready
curl http://localhost:9000/version
# Check logs
kubectl logs -f deployment/my-monitoring-relay- Usage Examples - Real-world usage scenarios
- Troubleshooting - Common issues and solutions