A sophisticated, production-ready Go application that provides secure multi-tenant access to Prometheus metrics in Kubernetes environments. This proxy implements dynamic service discovery, tenant-based access control, intelligent request routing, and automated metric collection with remote write capabilities.
- Kubernetes API Integration: Automatically discovers Prometheus instances using Kubernetes service discovery
- Label-based Filtering: Filter services based on labels like
app.kubernetes.io/name=prometheus
- Annotation Support: Additional filtering using Kubernetes annotations
- Multi-Resource Support: Discover from Services, Pods, or Endpoints
- Continuous Refresh: Automatically updates the list of available backends with health checking
- Custom Resource Definitions: Uses
MetricAccess
CRDs to define tenant access rules - Flexible Metric Patterns: Support for exact matches, regex patterns, and PromQL-style selectors
- Namespace Isolation: Tenant isolation based on Kubernetes namespaces
- Per-Tenant Metric Isolation: Optional namespace-level filtering at collection time for enhanced security
- Dynamic Configuration: Real-time updates when tenant configurations change
- Metric Filtering: Real-time filtering of query results based on tenant permissions
- Automated Metric Collection: Pulls metrics from infrastructure Prometheus and pushes to tenant instances
- Multiple Target Types: Support for Prometheus, Pushgateway, and external remote write endpoints
- Configurable Intervals: Per-tenant collection intervals (5s to hours)
- Extra Labels: Automatic addition of tenant and management labels
- Error Handling: Comprehensive retry logic and error reporting
- Load Balancing: Round-robin load balancing across healthy backends
- Health Checking: Automatic health monitoring of Prometheus backends
- Request Filtering: Real-time filtering of metrics based on tenant access rules
- Metric Aggregation: Aggregates results from multiple Prometheus instances
- Authentication: Header-based tenant authentication with extensible auth plugins
- Prometheus Metrics: Built-in metrics for monitoring proxy performance
- Structured Logging: JSON-formatted logs with configurable levels
- Health Endpoints: Health check and debug endpoints
- Request Tracing: Detailed request logging for troubleshooting
- Debug Endpoints: Real-time view of targets, tenants, and collected metrics
┌─────────────────────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Prometheus │ │ Prometheus │ │ Prometheus │ │
│ │ Instance A │ │ Instance B │ │ Instance C │ │
│ │ (monitoring) │ │ (monitoring) │ │ (monitoring) │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│ Multi-Tenant Proxy (monitoring ns) │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Service │ │ Tenant │ │ Remote Write │ │
│ │ Discovery │ │ Manager │ │ Controller │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Proxy │ │ Load │ │ Metrics │ │
│ │ Handler │ │ Balancer │ │ Collector │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│ Tenant Layer │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Tenant A │ │ Tenant B │ │ Tenant C │ │
│ │ Namespace │ │ Namespace │ │ Namespace │ │
│ │ │ │ │ │ │ │
│ │ MetricAccess │ │ MetricAccess │ │ MetricAccess │ │
│ │ Prometheus │ │ Prometheus │ │ Prometheus │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘
- Kubernetes cluster (1.19+)
- Go 1.21+ (for building from source)
- kubectl configured to access your cluster
# Deploy the Custom Resource Definition
kubectl apply -f deploy/kubernetes/crd.yaml
# Deploy the proxy configuration
kubectl apply -f deploy/kubernetes/configmap.yaml
# Deploy the proxy service
kubectl apply -f deploy/kubernetes/deployment.yaml
apiVersion: observability.ethos.io/v1alpha1
kind: MetricAccess
metadata:
name: my-app-metrics
namespace: my-app-namespace
spec:
# Source identifier for metrics
source: my-app-namespace
# Enable namespace isolation (optional - only collect metrics from this namespace)
metricIsolation: true
# Metrics this tenant can access
metrics:
- "http_requests_total"
- "http_request_duration_seconds"
- '{job="my-app"}'
- "up{job=\"my-app\"}"
# Optional: Remote write to tenant Prometheus
remoteWrite:
enabled: true
interval: "30s"
target:
type: "prometheus"
prometheus:
serviceName: "prometheus"
servicePort: 9090
extraLabels:
tenant: "my-app-namespace"
managed_by: "multi-tenant-proxy"
kubectl apply -f my-tenant-config.yaml
# Port forward to the proxy
kubectl port-forward svc/prometheus-multi-tenant-proxy 8080:8080 -n monitoring
# Query metrics with tenant authentication
curl -H "X-Tenant-Namespace: my-app-namespace" \
"http://localhost:8080/api/v1/query?query=up"
The proxy is configured via a YAML file mounted as a ConfigMap:
# Service Discovery Configuration
discovery:
kubernetes:
# Namespaces to watch for Prometheus services
namespaces:
- monitoring
# Label selectors for discovering Prometheus services
label_selectors:
app.kubernetes.io/name: prometheus
# Port name or number to use for Prometheus services
port: "9090"
# Resource types to discover
resource_types:
- Pod
# How often to refresh the list of targets
refresh_interval: 30s
# Tenant Management Configuration
tenants:
# Watch all namespaces for MetricAccess resources
watch_all_namespaces: true
# Proxy Configuration
proxy:
# Enable query result caching
enable_caching: true
cache_ttl: 5m
# Enable Prometheus metrics collection
enable_metrics: true
# Enable request logging
enable_request_logging: true
# Maximum concurrent requests to backends
max_concurrent_requests: 100
# Timeout for backend requests
backend_timeout: 30s
# Remote Write Configuration
remote_write:
# Enable remote write functionality
enabled: true
# How often to collect and forward metrics
collection_interval: 30s
# Batch size for remote write
batch_size: 1000
The MetricAccess
CRD defines tenant access rules and remote write configuration:
apiVersion: observability.ethos.io/v1alpha1
kind: MetricAccess
metadata:
name: tenant-metrics-access
namespace: tenant-namespace
spec:
# Source namespace/identifier
source: tenant-namespace
# Enable namespace-level metric isolation (optional)
# When true, only metrics from this tenant's namespace are collected
metricIsolation: true
# Metric patterns (supports multiple formats)
metrics:
- "http_requests_total" # Exact match
- "http_.*" # Regex pattern
- '{job="my-app"}' # PromQL selector
- '{__name__="up",instance=".*"}' # Complex selector
- "node_cpu_seconds_total{job=\"node-exporter\"}" # With labels
# Optional: Additional label selectors
labelSelectors:
environment: "production"
team: "webapp"
# Remote write configuration
remoteWrite:
enabled: true
interval: "30s"
target:
type: "prometheus" # or "pushgateway", "remote_write"
prometheus:
serviceName: "prometheus"
servicePort: 9090
extraLabels:
tenant: "tenant-namespace"
managed_by: "multi-tenant-proxy"
environment: "production"
honorLabels: true
The remote write functionality automatically collects metrics from infrastructure Prometheus instances and forwards them to tenant-specific targets.
- Metric Collection: The proxy queries infrastructure Prometheus instances for metrics matching tenant patterns
- Filtering: Applies tenant-specific filtering rules to collected metrics
- Enrichment: Adds extra labels for tenant identification and management
- Forwarding: Sends metrics to the tenant's Prometheus instance using the standard
/api/v1/write
endpoint
remoteWrite:
enabled: true
interval: "30s"
target:
type: "prometheus"
prometheus:
serviceName: "prometheus"
servicePort: 9090
extraLabels:
tenant: "my-team"
remoteWrite:
enabled: true
interval: "60s"
target:
type: "pushgateway"
pushgateway:
serviceName: "pushgateway"
servicePort: 9091
jobName: "remote-write-metrics"
remoteWrite:
enabled: true
interval: "15s"
target:
type: "remote_write"
remoteWrite:
url: "https://external-prometheus.example.com/api/v1/write"
basicAuth:
username: "monitoring-user"
passwordSecret:
name: "prometheus-auth"
key: "password"
headers:
X-Tenant: "my-team"
For remote write to work, the tenant Prometheus instance must be configured to accept remote write requests:
# Tenant Prometheus configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: tenant-namespace
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: tenant-namespace
spec:
serviceName: prometheus
template:
spec:
containers:
- name: prometheus
image: prom/prometheus:v2.45.0
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.enable-remote-write-receiver' # Enable remote write
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
ports:
- containerPort: 9090
name: web
The metricIsolation
feature provides true namespace-level isolation by filtering metrics at the collection stage, ensuring that tenant Prometheus instances only receive metrics from their own namespace.
When metricIsolation: true
is enabled in a MetricAccess
resource:
- Filtered Collection: Metrics are collected through
prom-label-proxy
with automatic namespace filtering - Namespace Injection: The
prom-label-proxy
automatically injects{namespace="tenant-namespace"}
into all queries - Storage Efficiency: Only namespace-specific metrics are stored in the tenant Prometheus instance
- Enhanced Security: No cross-namespace data leakage even at the storage level
apiVersion: observability.ethos.io/v1alpha1
kind: MetricAccess
metadata:
name: secure-tenant-metrics
namespace: secure-tenant
spec:
source: secure-tenant
metricIsolation: true # ✅ Only namespace-specific metrics
metrics:
- 'kube_pod_info' # Only pods in secure-tenant
- 'container_cpu_usage_seconds_total' # Only containers in secure-tenant
- 'http_requests_total' # Only app metrics in secure-tenant
remoteWrite:
enabled: true
interval: "30s"
target:
type: "prometheus"
prometheus:
serviceName: "prometheus"
servicePort: 9090
apiVersion: observability.ethos.io/v1alpha1
kind: MetricAccess
metadata:
name: debug-tenant-metrics
namespace: debug-tenant
spec:
source: debug-tenant
metricIsolation: false # ❌ All cluster metrics for debugging
# OR omit the field entirely (defaults to false)
metrics:
- 'kube_pod_info' # Pods from ALL namespaces
- 'kube_node_info' # All cluster nodes
- 'up' # All service health metrics
remoteWrite:
enabled: true
interval: "30s"
target:
type: "prometheus"
prometheus:
serviceName: "prometheus"
servicePort: 9090
Configuration | Metrics Collected | Storage Usage | Query Performance | Security Level |
---|---|---|---|---|
metricIsolation: false |
~10,000+ (all namespaces) | High | Slower (large dataset) | Query-time filtering |
metricIsolation: true |
~300 (tenant namespace only) | 97% reduction | Faster (focused dataset) | Collection + Query filtering |
- 🛡️ True Isolation: Tenant Prometheus only contains relevant namespace data
- 💾 Storage Efficiency: Dramatic reduction in storage requirements (up to 97% savings)
- ⚡ Better Performance: Faster queries on smaller, focused datasets
- 🔒 Enhanced Security: No cross-tenant data access even at storage level
- 🎛️ Per-Tenant Control: Each tenant can individually choose their isolation level
- 🔄 Backward Compatible: Existing tenants continue to work without changes
┌─────────────────────────────────────────────────────────────────┐
│ Infrastructure Prometheus │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ All Namespaces │ │ All Namespaces │ │ All Namespaces │ │
│ │ Prometheus │ │ Prometheus │ │ Prometheus │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Multi-Tenant Proxy + prom-label-proxy │
│ │
│ metricIsolation: false │ metricIsolation: true │
│ ┌─────────────────────┐ │ ┌─────────────────────┐ │
│ │ Direct Query │ │ │ prom-label-proxy │ │
│ │ (All Metrics) │ │ │ (Filtered Metrics) │ │
│ └─────────────────────┘ │ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ Debug Tenant │ │ Secure Tenant │
│ Prometheus │ │ Prometheus │
│ │ │ │
│ • 10,000+ metrics │ │ • 300 metrics │
│ • All namespaces │ │ • Single namespace │
│ • Large storage │ │ • Efficient storage │
└─────────────────────┘ └─────────────────────┘
metrics:
- "http_requests_total"
- "up"
- "prometheus_build_info"
metrics:
- "http_.*" # All metrics starting with "http_"
- ".*_duration_.*" # All metrics containing "_duration_"
- "node_.*" # All node exporter metrics
metrics:
- '{job="my-app"}' # All metrics from specific job
- '{__name__="up",instance=".*"}' # Up metric from any instance
- "http_requests_total{method=\"GET\"}" # Specific metric with label
- '{__name__=~"node_.*",job="node-exporter"}' # Regex with job filter
metrics:
- '{__name__=~"node_cpu_seconds_total|node_memory_MemAvailable_bytes",job="node-exporter",container="kube-rbac-proxy",namespace="monitoring"}'
# Clone the repository
git clone https://github.com/your-org/prometheus-multi-tenant-proxy
cd prometheus-multi-tenant-proxy
# Build the binary
go build -o prometheus-multi-tenant-proxy ./cmd/proxy
# Run tests
go test ./...
# Build Docker image
docker build -t prometheus-multi-tenant-proxy:latest .
# Run locally with debug logging
./prometheus-multi-tenant-proxy \
--config=examples/config.yaml \
--port=8080 \
--log-level=debug
├── api/v1alpha1/ # Custom Resource Definitions
│ ├── metricaccess_types.go
│ └── zz_generated.deepcopy.go
├── cmd/proxy/ # Main application entry point
│ └── main.go
├── internal/
│ ├── config/ # Configuration management
│ ├── discovery/ # Kubernetes service discovery
│ ├── proxy/ # HTTP proxy and load balancing
│ ├── tenant/ # Tenant management and access control
│ └── remote_write/ # Remote write functionality
├── deploy/kubernetes/ # Kubernetes manifests
├── examples/ # Example configurations
├── docs/ # Documentation
├── Dockerfile # Container image definition
├── Makefile # Build automation
└── README.md # This file
Deploy using the provided Kubernetes manifests:
# Deploy everything
kubectl apply -f deploy/kubernetes/
# Or step by step
kubectl apply -f deploy/kubernetes/crd.yaml
kubectl apply -f deploy/kubernetes/configmap.yaml
kubectl apply -f deploy/kubernetes/deployment.yaml
Run as a Docker container:
docker run -p 8080:8080 \
-v /path/to/config.yaml:/etc/prometheus-proxy/config.yaml \
bnkarthik6/prometheus-multi-tenant-proxy:latest \
--config=/etc/prometheus-proxy/config.yaml
Run the compiled binary directly:
./prometheus-multi-tenant-proxy \
--config=config.yaml \
--port=8080 \
--log-level=info
GET /api/v1/query
- Query endpoint with tenant filteringGET /api/v1/query_range
- Range query endpoint with tenant filteringGET /api/v1/series
- Series endpoint (proxied)GET /api/v1/labels
- Labels endpoint (proxied)
GET /health
- Health check with tenant and target statisticsGET /metrics
- Prometheus metrics (if enabled)GET /debug/targets
- Show discovered Prometheus targetsGET /debug/tenants
- Show tenant information and access rulesGET /collected-metrics
- Show metrics collected by remote write controller
Requests must include tenant identification via:
X-Tenant-Namespace
header (primary method)namespace
query parameter (fallback)
Example:
curl -H "X-Tenant-Namespace: my-app-namespace" \
"http://localhost:8080/api/v1/query?query=up"
- Header-based Authentication: Uses
X-Tenant-Namespace
header for tenant identification - Namespace Isolation: Tenants can only access their own namespace configurations
- Metric-level Access Control: Fine-grained control over which metrics tenants can access
- Label-based Filtering: Additional filtering based on metric labels
- Service Account: Runs with minimal RBAC permissions
- Non-root User: Container runs as non-root user (65534)
- Read-only Filesystem: Container filesystem is read-only
- Security Context: Drops all capabilities and prevents privilege escalation
- Namespace Boundaries: Each tenant operates within their own namespace
- Resource Quotas: Kubernetes resource quotas can limit tenant resource usage
- Network Policies: Can be used to restrict network access between tenants
The proxy exposes the following metrics:
prometheus_proxy_requests_total
- Total requests processedprometheus_proxy_request_duration_seconds
- Request duration histogramprometheus_proxy_backend_requests_total
- Backend request countersprometheus_proxy_targets_discovered
- Number of discovered targetsprometheus_proxy_tenants_active
- Number of active tenants
Structured JSON logs with configurable levels:
{
"level": "info",
"time": "2024-01-15T10:30:00Z",
"msg": "Aggregated and filtered results from all targets",
"tenant_id": "my-app-namespace/my-app-metrics",
"total_metrics": 1363,
"filtered_metrics": 1363,
"successful_targets": 3
}
/health
- Health status and statistics/debug/targets
- View discovered Prometheus targets with health status/debug/tenants
- View active tenant configurations and access rules/collected-metrics
- View metrics collected by the remote write controller
- No metrics returned: Check tenant authentication header and MetricAccess configuration
- Metrics not filtered: Verify MetricAccess patterns and tenant namespace
- Remote write not working: Check tenant Prometheus configuration and network connectivity
- Discovery issues: Verify service discovery configuration and RBAC permissions
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Run the test suite:
go test ./...
- Submit a pull request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- ✅ Multi-tenant metric filtering
- ✅ Remote write functionality
- ✅ Dynamic service discovery
- ✅ Kubernetes CRD support
- 🔄 Advanced query rewriting
- 🔄 Multi-cluster support
- 🔄 Grafana integration
- 🔄 Advanced authentication providers (OIDC, LDAP)
- 🔄 Query result caching with Redis
- 🔄 Horizontal pod autoscaling support
- 🔄 Custom metrics and alerting rules