Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
name: End-to-End Test

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
workflow_dispatch:

jobs:
end-to-end-tests:
runs-on: ubuntu-latest
timeout-minutes: 45 # Increased to accommodate recording rules wait time

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Move Docker data to larger disk
run: |
echo "=== Disk space ==="
df -h / /mnt | grep -E '(Filesystem|/dev)'
echo ""
echo "=== Stopping Docker ==="
sudo systemctl stop docker.socket docker.service containerd.service
echo ""
echo "=== Preparing /mnt for Docker data ==="
sudo mkdir -p /mnt/docker-data/docker /mnt/docker-data/containerd
echo ""
echo "=== Bind mounting to /mnt ==="
sudo mount --bind /mnt/docker-data/docker /var/lib/docker
sudo mount --bind /mnt/docker-data/containerd /var/lib/containerd
echo ""
echo "=== Starting Docker ==="
sudo systemctl start containerd.service docker.service
echo ""
echo "=== Verifying Docker works ==="
docker ps

- name: Install Nix
uses: cachix/install-nix-action@v27
with:
nix_path: nixpkgs=channel:nixos-unstable

- name: Show environment info
run: |
echo "=== System Info ==="
uname -a
echo "=== Docker Info ==="
docker version
echo "=== Available space ==="
df -h | head -5
echo "=== Nix version ==="
nix --version

- name: Run test-1-setup.sh (15-20 min)
run: nix develop --command ./test-1-setup.sh

- name: Run test-2-teardown.sh (10-15 min)
run: nix develop --command ./test-2-teardown.sh
16 changes: 15 additions & 1 deletion monitoring/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,23 @@ You can find the dashboard under `Home > Dashboards > grafana > CloudNativePG`.
> streaming is unavailable, but all other Grafana features work normally
> when accessed via port-forward.

## CloudNativePG Grafana Dashboard

[CloudNativePG provides a default dashboard](https://cloudnative-pg.io/docs/devel/quickstart#grafana-dashboard) for Grafana in the dedicated [`grafana-dashboards` repository](https://github.com/cloudnative-pg/grafana-dashboards). The CNPG Playground monitoring `setup.sh` automatically installs the CNPG dashboard into grafana. You can also download the file [grafana-dashboard.json](https://github.com/cloudnative-pg/grafana-dashboards/blob/main/charts/cluster/grafana-dashboard.json) and manually import it via the GUI (menu: Dashboards > New > Import).

### Dependencies

The CNPG Playground monitoring `setup.sh` also installs and configures the dependencies of this dashboard:

1. `node-exporter`: Node-level metrics (CPU, memory, disk, network at the host level)
2. `kube-state-metrics`: Kubernetes object metrics (pods, deployments, resource requests/limits)
3. Kubelet/cAdvisor Metrics (via `/metrics/cadvisor`): Container-level metrics (CPU, memory, network, disk I/O)
4. Canonical **Kubernetes recording rules from [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus)**, which pre-compute common aggregations used by the CloudNativePG dashboard such as `node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate` and `node_namespace_pod_container:container_memory_working_set_bytes` and `namespace_cpu:kube_pod_container_resource_requests:sum`

## PodMonitor

To enable Prometheus to scrape metrics from your PostgreSQL pods, you must
create a `PodMonitor` resource as described in the
[documentation](https://cloudnative-pg.io/documentation/current/monitoring/#creating-a-podmonitor).


If a monitoring stack is running, then `demo/setup.sh` will automatically create PodMonitors.
93 changes: 93 additions & 0 deletions monitoring/kube-state-metrics/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
app.kubernetes.io/name: kube-state-metrics
spec:
clusterIP: None
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
- name: telemetry
port: 8081
targetPort: telemetry
selector:
app.kubernetes.io/name: kube-state-metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
app.kubernetes.io/name: kube-state-metrics
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
template:
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
nodeSelector:
node-role.kubernetes.io/infra: ""
containers:
- name: kube-state-metrics
image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
ports:
- name: http-metrics
containerPort: 8080
- name: telemetry
containerPort: 8081
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
app.kubernetes.io/name: kube-state-metrics
spec:
endpoints:
- port: http-metrics
interval: 30s
honorLabels: true
- port: telemetry
interval: 30s
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
6 changes: 6 additions & 0 deletions monitoring/kube-state-metrics/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kube-system
resources:
- deployment.yaml

7 changes: 7 additions & 0 deletions monitoring/node-exporter/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- node-exporter.yaml
- servicemonitor.yaml


73 changes: 73 additions & 0 deletions monitoring/node-exporter/node-exporter.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-exporter
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
labels:
app.kubernetes.io/name: node-exporter
spec:
selector:
matchLabels:
app.kubernetes.io/name: node-exporter
template:
metadata:
labels:
app.kubernetes.io/name: node-exporter
spec:
serviceAccountName: node-exporter
hostNetwork: true
hostPID: true
containers:
- name: node-exporter
image: quay.io/prometheus/node-exporter:v1.8.2
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
ports:
- containerPort: 9100
name: metrics
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: root
mountPath: /host/root
readOnly: true
tolerations:
- operator: Exists
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
---
apiVersion: v1
kind: Service
metadata:
name: node-exporter
namespace: kube-system
labels:
app.kubernetes.io/name: node-exporter
spec:
clusterIP: None
ports:
- name: metrics
port: 9100
targetPort: metrics
selector:
app.kubernetes.io/name: node-exporter
14 changes: 14 additions & 0 deletions monitoring/node-exporter/servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: node-exporter
namespace: kube-system
labels:
app.kubernetes.io/name: node-exporter
spec:
endpoints:
- port: metrics
interval: 30s
selector:
matchLabels:
app.kubernetes.io/name: node-exporter
7 changes: 6 additions & 1 deletion monitoring/prometheus-instance/deploy_prometheus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ rules:
resources:
- nodes
- nodes/metrics
- nodes/proxy
- services
- endpoints
- pods
Expand Down Expand Up @@ -44,7 +45,7 @@ roleRef:
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
namespace: prometheus-operator
---

apiVersion: monitoring.coreos.com/v1
Expand All @@ -55,5 +56,9 @@ spec:
serviceAccountName: prometheus
podMonitorSelector: {}
podMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
serviceMonitorNamespaceSelector: {}
ruleSelector: {}
ruleNamespaceSelector: {}
nodeSelector:
node-role.kubernetes.io/infra: ""
2 changes: 2 additions & 0 deletions monitoring/prometheus-instance/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@ apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deploy_prometheus.yaml
# Fetch upstream recording rules from kube-prometheus
- https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/v0.16.0/manifests/kubernetesControlPlane-prometheusRule.yaml
namespace: prometheus-operator
55 changes: 55 additions & 0 deletions monitoring/prometheus-instance/servicemonitor-kubelet.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
apiVersion: v1
kind: Service
metadata:
name: kubelet
namespace: kube-system
labels:
app.kubernetes.io/name: kubelet
spec:
clusterIP: None
ports:
- name: https-metrics
port: 10250
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kubelet
namespace: kube-system
labels:
app.kubernetes.io/name: kubelet
spec:
endpoints:
- port: https-metrics
interval: 30s
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
insecureSkipVerify: true
relabelings:
- targetLabel: metrics_path
replacement: /metrics
- port: https-metrics
interval: 30s
path: /metrics/cadvisor
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
insecureSkipVerify: true
honorLabels: true
relabelings:
- targetLabel: metrics_path
replacement: /metrics/cadvisor
- port: https-metrics
interval: 30s
path: /metrics/probes
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
insecureSkipVerify: true
relabelings:
- targetLabel: metrics_path
replacement: /metrics/probes
selector:
matchLabels:
app.kubernetes.io/name: kubelet
2 changes: 1 addition & 1 deletion monitoring/prometheus-operator/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- https://github.com/prometheus-operator/prometheus-operator/
- https://github.com/prometheus-operator/prometheus-operator/releases/download/v0.87.1/bundle.yaml
namespace: prometheus-operator
Loading