Skip to content

Commit f89c6ff

Browse files
committed
feat: monitoring - CNPG dashboard deps, teardown, tests
followup to #38 - add dependencies of the CNPG dashboard (kube-state-metrics and node-exporter and prometheus default recording rules), add a teardown script and add tests Signed-off-by: Jeremy Schneider <schneider@ardentperf.com>
1 parent f365164 commit f89c6ff

File tree

15 files changed

+707
-6
lines changed

15 files changed

+707
-6
lines changed

.github/workflows/test.yml

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
name: End-to-End Test
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
workflow_dispatch:
9+
10+
jobs:
11+
end-to-end-tests:
12+
runs-on: ubuntu-latest
13+
timeout-minutes: 45 # Increased to accommodate recording rules wait time
14+
15+
steps:
16+
- name: Checkout repository
17+
uses: actions/checkout@v4
18+
19+
- name: Move Docker data to larger disk
20+
run: |
21+
echo "=== Disk space ==="
22+
df -h / /mnt | grep -E '(Filesystem|/dev)'
23+
echo ""
24+
echo "=== Stopping Docker ==="
25+
sudo systemctl stop docker.socket docker.service containerd.service
26+
echo ""
27+
echo "=== Preparing /mnt for Docker data ==="
28+
sudo mkdir -p /mnt/docker-data/docker /mnt/docker-data/containerd
29+
echo ""
30+
echo "=== Bind mounting to /mnt ==="
31+
sudo mount --bind /mnt/docker-data/docker /var/lib/docker
32+
sudo mount --bind /mnt/docker-data/containerd /var/lib/containerd
33+
echo ""
34+
echo "=== Starting Docker ==="
35+
sudo systemctl start containerd.service docker.service
36+
echo ""
37+
echo "=== Verifying Docker works ==="
38+
docker ps
39+
40+
- name: Install Nix
41+
uses: cachix/install-nix-action@v27
42+
with:
43+
nix_path: nixpkgs=channel:nixos-unstable
44+
45+
- name: Show environment info
46+
run: |
47+
echo "=== System Info ==="
48+
uname -a
49+
echo "=== Docker Info ==="
50+
docker version
51+
echo "=== Available space ==="
52+
df -h | head -5
53+
echo "=== Nix version ==="
54+
nix --version
55+
56+
- name: Run test-1-setup.sh (15-20 min)
57+
run: nix develop --command ./test-1-setup.sh
58+
59+
- name: Run test-2-teardown.sh (10-15 min)
60+
run: nix develop --command ./test-2-teardown.sh

monitoring/README.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,23 @@ You can find the dashboard under `Home > Dashboards > grafana > CloudNativePG`.
6060

6161
![dashboard](image.png)
6262

63+
## CloudNativePG Grafana Dashboard
64+
65+
[CloudNativePG provides a default dashboard](https://cloudnative-pg.io/docs/devel/quickstart#grafana-dashboard) for Grafana in the dedicated [`grafana-dashboards` repository](https://github.com/cloudnative-pg/grafana-dashboards). The CNPG Playground monitoring `setup.sh` automatically installs the CNPG dashboard into grafana. You can also download the file [grafana-dashboard.json](https://github.com/cloudnative-pg/grafana-dashboards/blob/main/charts/cluster/grafana-dashboard.json) and manually import it via the GUI (menu: Dashboards > New > Import).
66+
67+
### Dependencies
68+
69+
The CNPG Playground monitoring `setup.sh` also installs and configures the dependencies of this dashboard:
70+
71+
1. `node-exporter`: Node-level metrics (CPU, memory, disk, network at the host level)
72+
2. `kube-state-metrics`: Kubernetes object metrics (pods, deployments, resource requests/limits)
73+
3. Kubelet/cAdvisor Metrics (via `/metrics/cadvisor`): Container-level metrics (CPU, memory, network, disk I/O)
74+
4. Canonical **Kubernetes recording rules from [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus)**, which pre-compute common aggregations used by the CloudNativePG dashboard such as `node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate` and `node_namespace_pod_container:container_memory_working_set_bytes` and `namespace_cpu:kube_pod_container_resource_requests:sum`
75+
6376
## PodMonitor
6477

6578
To enable Prometheus to scrape metrics from your PostgreSQL pods, you must
6679
create a `PodMonitor` resource as described in the
6780
[documentation](https://cloudnative-pg.io/documentation/current/monitoring/#creating-a-podmonitor).
68-
81+
82+
If a monitoring stack is running, then `demo/setup.sh` will automatically create PodMonitors.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
apiVersion: v1
2+
kind: ServiceAccount
3+
metadata:
4+
name: kube-state-metrics
5+
namespace: kube-system
6+
---
7+
apiVersion: rbac.authorization.k8s.io/v1
8+
kind: ClusterRole
9+
metadata:
10+
name: kube-state-metrics
11+
rules:
12+
- apiGroups: ["*"]
13+
resources: ["*"]
14+
verbs: ["list", "watch"]
15+
---
16+
apiVersion: rbac.authorization.k8s.io/v1
17+
kind: ClusterRoleBinding
18+
metadata:
19+
name: kube-state-metrics
20+
roleRef:
21+
apiGroup: rbac.authorization.k8s.io
22+
kind: ClusterRole
23+
name: kube-state-metrics
24+
subjects:
25+
- kind: ServiceAccount
26+
name: kube-state-metrics
27+
namespace: kube-system
28+
---
29+
apiVersion: v1
30+
kind: Service
31+
metadata:
32+
name: kube-state-metrics
33+
namespace: kube-system
34+
labels:
35+
app.kubernetes.io/name: kube-state-metrics
36+
spec:
37+
clusterIP: None
38+
ports:
39+
- name: http-metrics
40+
port: 8080
41+
targetPort: http-metrics
42+
- name: telemetry
43+
port: 8081
44+
targetPort: telemetry
45+
selector:
46+
app.kubernetes.io/name: kube-state-metrics
47+
---
48+
apiVersion: apps/v1
49+
kind: Deployment
50+
metadata:
51+
name: kube-state-metrics
52+
namespace: kube-system
53+
labels:
54+
app.kubernetes.io/name: kube-state-metrics
55+
spec:
56+
replicas: 1
57+
selector:
58+
matchLabels:
59+
app.kubernetes.io/name: kube-state-metrics
60+
template:
61+
metadata:
62+
labels:
63+
app.kubernetes.io/name: kube-state-metrics
64+
spec:
65+
serviceAccountName: kube-state-metrics
66+
nodeSelector:
67+
node-role.kubernetes.io/infra: ""
68+
containers:
69+
- name: kube-state-metrics
70+
image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
71+
ports:
72+
- name: http-metrics
73+
containerPort: 8080
74+
- name: telemetry
75+
containerPort: 8081
76+
---
77+
apiVersion: monitoring.coreos.com/v1
78+
kind: ServiceMonitor
79+
metadata:
80+
name: kube-state-metrics
81+
namespace: kube-system
82+
labels:
83+
app.kubernetes.io/name: kube-state-metrics
84+
spec:
85+
endpoints:
86+
- port: http-metrics
87+
interval: 30s
88+
- port: telemetry
89+
interval: 30s
90+
selector:
91+
matchLabels:
92+
app.kubernetes.io/name: kube-state-metrics
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
namespace: kube-system
4+
resources:
5+
- deployment.yaml
6+
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
resources:
4+
- node-exporter.yaml
5+
- servicemonitor.yaml
6+
7+
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
apiVersion: v1
2+
kind: ServiceAccount
3+
metadata:
4+
name: node-exporter
5+
namespace: kube-system
6+
---
7+
apiVersion: apps/v1
8+
kind: DaemonSet
9+
metadata:
10+
name: node-exporter
11+
namespace: kube-system
12+
labels:
13+
app.kubernetes.io/name: node-exporter
14+
spec:
15+
selector:
16+
matchLabels:
17+
app.kubernetes.io/name: node-exporter
18+
template:
19+
metadata:
20+
labels:
21+
app.kubernetes.io/name: node-exporter
22+
spec:
23+
serviceAccountName: node-exporter
24+
hostNetwork: true
25+
hostPID: true
26+
containers:
27+
- name: node-exporter
28+
image: quay.io/prometheus/node-exporter:v1.8.2
29+
args:
30+
- --path.procfs=/host/proc
31+
- --path.sysfs=/host/sys
32+
- --path.rootfs=/host/root
33+
ports:
34+
- containerPort: 9100
35+
name: metrics
36+
volumeMounts:
37+
- name: proc
38+
mountPath: /host/proc
39+
readOnly: true
40+
- name: sys
41+
mountPath: /host/sys
42+
readOnly: true
43+
- name: root
44+
mountPath: /host/root
45+
readOnly: true
46+
tolerations:
47+
- operator: Exists
48+
volumes:
49+
- name: proc
50+
hostPath:
51+
path: /proc
52+
- name: sys
53+
hostPath:
54+
path: /sys
55+
- name: root
56+
hostPath:
57+
path: /
58+
---
59+
apiVersion: v1
60+
kind: Service
61+
metadata:
62+
name: node-exporter
63+
namespace: kube-system
64+
labels:
65+
app.kubernetes.io/name: node-exporter
66+
spec:
67+
clusterIP: None
68+
ports:
69+
- name: metrics
70+
port: 9100
71+
targetPort: metrics
72+
selector:
73+
app.kubernetes.io/name: node-exporter
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: ServiceMonitor
3+
metadata:
4+
name: node-exporter
5+
namespace: kube-system
6+
labels:
7+
app.kubernetes.io/name: node-exporter
8+
spec:
9+
endpoints:
10+
- port: metrics
11+
interval: 30s
12+
selector:
13+
matchLabels:
14+
app.kubernetes.io/name: node-exporter

monitoring/prometheus-instance/deploy_prometheus.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ rules:
1212
resources:
1313
- nodes
1414
- nodes/metrics
15+
- nodes/proxy
1516
- services
1617
- endpoints
1718
- pods
@@ -44,7 +45,7 @@ roleRef:
4445
subjects:
4546
- kind: ServiceAccount
4647
name: prometheus
47-
namespace: default
48+
namespace: prometheus-operator
4849
---
4950

5051
apiVersion: monitoring.coreos.com/v1
@@ -55,5 +56,9 @@ spec:
5556
serviceAccountName: prometheus
5657
podMonitorSelector: {}
5758
podMonitorNamespaceSelector: {}
59+
serviceMonitorSelector: {}
60+
serviceMonitorNamespaceSelector: {}
61+
ruleSelector: {}
62+
ruleNamespaceSelector: {}
5863
nodeSelector:
5964
node-role.kubernetes.io/infra: ""

monitoring/prometheus-instance/kustomization.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,6 @@ apiVersion: kustomize.config.k8s.io/v1beta1
22
kind: Kustomization
33
resources:
44
- deploy_prometheus.yaml
5+
# Fetch upstream recording rules from kube-prometheus
6+
- https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/v0.16.0/manifests/kubernetesControlPlane-prometheusRule.yaml
57
namespace: prometheus-operator
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: kubelet
5+
namespace: kube-system
6+
labels:
7+
app.kubernetes.io/name: kubelet
8+
spec:
9+
clusterIP: None
10+
ports:
11+
- name: https-metrics
12+
port: 10250
13+
---
14+
apiVersion: monitoring.coreos.com/v1
15+
kind: ServiceMonitor
16+
metadata:
17+
name: kubelet
18+
namespace: kube-system
19+
labels:
20+
app.kubernetes.io/name: kubelet
21+
spec:
22+
endpoints:
23+
- port: https-metrics
24+
interval: 30s
25+
scheme: https
26+
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
27+
tlsConfig:
28+
insecureSkipVerify: true
29+
relabelings:
30+
- targetLabel: metrics_path
31+
replacement: /metrics
32+
- port: https-metrics
33+
interval: 30s
34+
path: /metrics/cadvisor
35+
scheme: https
36+
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
37+
tlsConfig:
38+
insecureSkipVerify: true
39+
relabelings:
40+
- targetLabel: metrics_path
41+
replacement: /metrics/cadvisor
42+
- port: https-metrics
43+
interval: 30s
44+
path: /metrics/probes
45+
scheme: https
46+
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
47+
tlsConfig:
48+
insecureSkipVerify: true
49+
relabelings:
50+
- targetLabel: metrics_path
51+
replacement: /metrics/probes
52+
selector:
53+
matchLabels:
54+
app.kubernetes.io/name: kubelet

0 commit comments

Comments
 (0)