KubeLedger is the System of Record that tracks the full picture of Kubernetes costs — revealing the 30% hidden in non-allocatable overhead for precise, per-namespace accounting.
Note: KubeLedger was formerly known as Kubernetes Opex Analytics aka
kube-opex-analytics. Read more about this change in our announcement blog post. To handle the migration in a straightforward way, we have provided a migration procedure.
- Overview
- Key Features
- Quick Start
- Architecture
- Documentation
- Configuration
- Troubleshooting
- License
- Support & Contributions
KubeLedger is a usage accounting tool that helps organizations track, analyze, and optimize CPU, Memory, and GPU resources on Kubernetes clusters over time (hourly, daily, monthly).
It acts as a System of Record for your cluster resources, providing insightful usage analytics and charts that engineering and financial teams can use as key indicators for cost optimization decisions.
- CPU - Core usage and requests per namespace
- Memory - RAM consumption and requests per namespace
- GPU - NVIDIA GPU utilization via DCGM integration
Multi-cluster Integration: KubeLedger tracks usage for a single Kubernetes cluster. For centralized multi-cluster analytics, see Krossboard Kubernetes Operator (demo video).
| Feature | Description |
|---|---|
| Hourly/Daily/Monthly Trends | Tracks actual usage and requested capacities per namespace, collected every 5 minutes and consolidated hourly |
| Non-allocatable Capacity Tracking | Highlights system overhead (OS, kubelets) vs. usable application capacity at node and cluster levels |
| Cluster Capacity Planning | Visualize consumed capacity globally, instantly, and over time |
| Usage Efficiency Analysis | Compare resource requests against actual usage to identify over/under-provisioning |
| Cost Allocation & Chargeback | Automatic resource usage accounting per namespace for billing and showback |
| Prometheus Integration | Native exporter at /metrics for Grafana dashboards and alerting |
- Kubernetes cluster v1.19+ (or OpenShift 4.x+)
kubectlconfigured with cluster access- Helm 3.x (fine-tuned installation) or
kubectlfor a basic opinionated deployment - Cluster permissions: read access to pods, nodes, and namespaces
- Kubernetes Metrics Server deployed in your cluster (required for CPU and memory metrics)
- NVIDIA DCGM Exporter deployed in your cluster (required for GPU metrics, optional if no GPUs)
Before installing, ensure metrics-server is running in your cluster:
# Check if metrics-server is deployed
kubectl -n kube-system get deploy | grep metrics-server
# Verify it's working
kubectl top nodes
# If not installed, deploy with kubectl
kubectl apply -f [https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml](https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml)If your cluster has NVIDIA GPUs and you want GPU metrics, ensure DCGM Exporter is running:
# Check if DCGM Exporter is deployed
kubectl get daemonset -A | grep dcgm
# If not installed, deploy with Helm (requires NVIDIA GPU Operator or drivers)
helm repo add gpu-helm-charts https://nvidia.github.io/dcgm-exporter/helm-charts
helm install dcgm-exporter gpu-helm-charts/dcgm-exporter \
--namespace gpu-operator \
--create-namespacegit clone https://github.com/realopslabs/kubeledger.git --depth=1
cd kubeledgerOpenShift users: Skip this section and use Helm installation with OpenShift-specific settings.
# Create namespace
kubectl create namespace kubeledger
# Deploy using Kustomize
kubectl apply -k ./manifests/kubeledger/kustomize -n kubeledger
# Watch pod status
kubectl get pods -n kubeledger -wThe following steps covers the following scenarios of advanced customization (see manifests/kubeledger/helm/values.yaml for more options):
- OpenShift: Set
securityContext.openshift: true - Custom storage: Set
dataVolume.storageClassanddataVolume.capacity - DCGM Integration: Set
dcgm.enable: trueanddcgm.endpoint
# Create namespace
kubectl create namespace kubeledger
# Install with Helm on Kubernetes
helm upgrade --install kubeledger ./manifests/kubeledger/helm -n kubeledger
# Install with Helm on Kubernetes with GPU support
helm upgrade --install kubeledger ./manifests/kubeledger/helm -n kubeledger \
--set dcgm.enable=true \
--set dcgm.endpoint="dcgm-exporter.monotiring.svc.cluster.local:9400"
# Install with Helm on OpenShift
helm upgrade --install kubeledger ./manifests/kubeledger/helm -n kubeledger --set securityContext.openshift=true
# Install with Helm on OpenShift with GPU support
helm upgrade --install kubeledger ./manifests/kubeledger/helm -n kubeledger \
--set securityContext.openshift=true \
--set dcgm.enable=true \
--set dcgm.endpoint="dcgm-exporter.monotiring.svc.cluster.local:9400"
# Watch pod status
kubectl get pods -n kubeledger -w# Port-forward to access the UI
kubectl port-forward svc/kubeledger 5483:80 -n kubeledger
# Open http://localhost:5483 in your browserRequires kubectl proxy running locally to provide API access:
# Start kubectl proxy in background
kubectl proxy &
# Run KubeLedger
docker run -d \
--net="host" \
--name kubeledger \
-v /var/lib/kubeledger:/data \
-e KL_DB_LOCATION=/data/db \
-e KL_K8S_API_ENDPOINT=http://127.0.0.1:8001 \
ghcr.io/realopslabs/kubeledgerThe dashboard is available at http://localhost:5483.
┌───────────────────┐
│ Metrics Server │──┐
│ (CPU/Memory) │ │ ┌───────────────────────────────────────┐
└───────────────────┘ ├───>│ KubeLedger │
┌───────────────────┐ │ │ ┌─────────┐ ┌────────┐ ┌─────────┐ │
│ DCGM Exporter │──┘ │ │ Poller │─>│RRD DBs │─>│ API │ │
│ (GPU metrics) │ │ │ (5 min) │ │ │ │ │ │
└───────────────────┘ │ └─────────┘ └────────┘ └────┬────┘ │
└────────────────────────────────┼──────┘
│
┌────────────────────────────────┼───────┐
│ v │
│ ┌────────────┐ ┌──────────────┐ │
│ │ Web UI │ │ /metrics │ │
│ │ (D3.js) │ │ (Prometheus) │ │
│ └────────────┘ └──────────────┘ │
└────────────────────────────────────────┘
│ │
v v
Built-in Dashboards Grafana/Alerting
Data Flow:
- Metrics polled every 5 minutes (configurable):
- CPU/Memory from Kubernetes Metrics Server
- GPU from NVIDIA DCGM Exporter
- Metrics are processed and stored in internal lightweight time-series databases (round-robin DBs)
- Data is consolidated into hourly, daily, and monthly aggregates
- API serves data to the built-in web UI and Prometheus scraper
| Topic | Link |
|---|---|
| Installation on Kubernetes and OpenShift | https://kubeledger.io/docs/installation-on-kubernetes-and-openshift/ |
| Installation on Docker | https://kubeledger.io/docs/installation-on-docker/ |
| Built-in Dashboards and Charts of KubeLedger | https://kubeledger.io/docs/built-in-dashboards-and-charts/ |
| Prometheus Exporter and Grafana dashboards | https://kubeledger.io/docs/prometheus-exporter-grafana-dashboard/ |
| KubeLedger Configuration Settings | https://kubeledger.io/docs/configuration-settings/ |
| Design Fundamentals | https://kubeledger.io/docs/design-fundamentals/ |
Migration Note: All environment variables now use the
KL_prefix. OldKOA_variables are deprecated but will be supported for backward compatibility for 6 months.
Key environment variables:
| Variable | Description | Default |
|---|---|---|
KL_K8S_API_ENDPOINT |
Kubernetes API server URL | Required |
KL_K8S_AUTH_TOKEN |
Service account token | Auto-detected in-cluster |
KL_DB_LOCATION |
Path for RRDtool databases | /data |
KL_POLLING_INTERVAL_SEC |
Metrics collection interval | 300 |
KL_COST_MODEL |
Billing model (CUMULATIVE_RATIO, RATIO, CHARGE_BACK) |
CUMULATIVE_RATIO |
KL_BILLING_HOURLY_RATE |
Hourly cost for chargeback model | -1.0 |
KL_BILLING_CURRENCY_SYMBOL |
Currency symbol for cost display | $ |
KL_NVIDIA_DCGM_ENDPOINT |
NVIDIA DCGM Exporter endpoint for GPU metrics | Not set (GPU disabled) |
To enable GPU metrics collection, set the DCGM Exporter endpoint:
# Environment variable
export KL_NVIDIA_DCGM_ENDPOINT=http://dcgm-exporter.gpu-operator:9400/metrics
# Or with Helm
helm upgrade --install kubeledger ./manifests/kubeledger/helm \
--set dcgm.enabled=true \
--set dcgm.endpoint=http://dcgm-exporter.gpu-operator:9400/metricsSee Configuration Settings for the complete reference.
Pod stuck in CrashLoopBackOff
- Check logs:
kubectl logs -f deployment/kubeledger -n kubeledger - Verify RBAC permissions are correctly applied
- Ensure the service account has read access to pods and nodes
No data appearing in dashboard
- Wait at least 5-10 minutes for initial data collection
- Verify the pod can reach the Kubernetes API: check for connection errors in logs
- Confirm
KL_K8S_API_ENDPOINTis correctly set
Metrics not appearing in Prometheus
- Ensure the
/metricsendpoint is accessible - Check ServiceMonitor/PodMonitor configuration if using Prometheus Operator
- Verify network policies allow Prometheus to scrape the pod
Pooling interval
- By default, the polling interval to collect raw metrics from Kubernetes API or NVIDIA DCGM is 300 seconds (5 minutes).
- You can increase this limit using the variable
KL_POLLING_INTERVAL_SEC. Always use a multiple 300 seconds, as the backend RRD database is based on a 5-minutes resolution.
We welcome feedback and contributions!
- Submit an issue: GitHub Issues
- Contribute Code: Pull Requests
All contributions must be released under Apache 2.0 License terms.
KubeLedger is licensed under the Business Source License 1.1.
Permitted: Non-commercial use, internal business use, development, testing, and personal projects.
Not Permitted: Offering KubeLedger as a commercial hosted service or managed offering.
The license converts to Apache 2.0 on [DATE + 4 years].
