Skip to content

Commit 817660a

Browse files
committed
feat(prometheus): Add prometheus for metrics collection
Initial commit of prometheus that collects and displays metrics, stores them in Thanos, and includes dashboards in Grafana. The installation uses the kube-prometheus project (https://github.com/prometheus-operator/kube-prometheus), manifests generation uses the `Makefile`, with application using kustomize. As the upstream project using jsonnet for configuration, most configuration options are in the settings.jsonnet file. Changes to the settings, requires rerunning the `make manifests` to create the new manifests. Commit the results to the metacpan-k8s repository.
1 parent a3514e3 commit 817660a

File tree

110 files changed

+89843
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

110 files changed

+89843
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
**/charts/
22
platform/kube-thanos/vendor
3+
platform/kube-prometheus/vendor

platform/kube-prometheus/Makefile

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
.DEFAULT_GOAL := help
2+
3+
SHELL := /bin/bash
4+
PATH := $(PWD)/tmp/bin:${PATH}
5+
JSONNET_FILE := settings.jsonnet
6+
7+
.PHONY: all clean manifests check-tools
8+
9+
all: check-tools clean manifests
10+
11+
.PHONY: init
12+
init:
13+
jb init
14+
jb install github.com/prometheus-operator/kube-prometheus/jsonnet/kube-prometheus@main
15+
16+
.PHONY: manifests
17+
manifests:
18+
mkdir -p base/setup
19+
jsonnet -J vendor -m base $(JSONNET_FILE) | xargs -I{} sh -c 'cat {} | gojsontoyaml > {}.yaml' -- {}
20+
find base -type f ! -name '*.yaml' -delete
21+
rm -f kustomization
22+
rm base/grafana-config.yaml
23+
24+
.PHONY: clean
25+
clean:
26+
rm -rf base
27+
28+
.PHONY: upgrade
29+
upgrade:
30+
jb update github.com/prometheus-operator/kube-prometheus/jsonnet/kube-prometheus@main
31+
32+
.PHONY: check-tools
33+
check-tools:
34+
@command -v jb >/dev/null 2>&1 || { echo >&2 "jb is not installed. Aborting."; exit 1; }
35+
@command -v jsonnet >/dev/null 2>&1 || { echo >&2 "jsonnet is not installed. Aborting."; exit 1; }
36+
@command -v gojsontoyaml >/dev/null 2>&1 || { echo >&2 "gojsontoyaml is not installed. Aborting."; exit 1; }
37+
38+
.PHONY: help
39+
help:
40+
@echo "Makefile for pulling kube-prometheus YAML"
41+
@echo ""
42+
@echo "Targets:"
43+
@echo " manifests : extract kube-prometheus YAML from jssonet with specified settings.jsonnet"
44+
@echo " upgrade : update kube-prometheus from jssonet with specified settings.jsonnet"
45+
@echo " init : initialize jsonnet vendor packages"
46+
@echo " check-tools : ensure required tools are installed"
47+
@echo " help : Display this help message"

platform/kube-prometheus/README.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# kube-prometheus
2+
3+
The [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus)
4+
project provides a set of Kubernetes manifests, Grafana dashboards, and
5+
Prometheus rules to deploy a full monitoring stack for Kubernetes. It includes
6+
Prometheus, Alertmanager, and Grafana, along with various exporters and
7+
dashboards to monitor the health and performance of a Kubernetes cluster.
8+
9+
## Makefile Overview
10+
11+
This project's design is to pull and manage the kube-prometheus YAML
12+
configurations using jsonnet from the upstream project. The Makefile provided
13+
in this project helps automate tasks such as initializing dependencies,
14+
generating manifests, cleaning up, and ensure required tools are present.
15+
16+
## Prerequisites
17+
18+
Before using the Makefile, ensure you have the following tools installed:
19+
20+
- `jb` (jsonnet Bundler)
21+
- `jsonnet`
22+
- `gojsontoyaml`
23+
24+
## Makefile Targets
25+
26+
The Makefile includes targets to help manage the project:
27+
28+
### `all`
29+
30+
Runs the `check-tools`, `clean`, and `manifests` targets in sequence. This is
31+
the default target.
32+
33+
### `init`
34+
35+
Initializes the jsonnet vendor packages by running `jb init` and installs the
36+
kube-prometheus jsonnet library.
37+
38+
### `manifests`
39+
40+
Generates the kube-prometheus YAML files from the specified `settings.jsonnet`
41+
file. It converts the jsonnet output to YAML and places the files in the `base`
42+
directory. Removal of non-YAML files in the `base` directory, and the
43+
`kustomization` file.
44+
45+
### `clean`
46+
47+
Removes the `base` directory and its contents.
48+
49+
### `upgrade`
50+
51+
Updates the kube-prometheus jsonnet library to the latest version.
52+
53+
### `check-tools`
54+
55+
Checks if the installation of required tools (`jb`, `jsonnet`, and
56+
`gojsontoyaml`). If any of the tools are missing, the target will abort with an
57+
error message.
58+
59+
### `help`
60+
61+
Displays a help message with a brief description of each target.
62+
63+
## Usage
64+
65+
To use the Makefile, run the following commands in your terminal:
66+
67+
1. **Initialize the project:**
68+
69+
```sh
70+
make init
71+
```
72+
73+
2. **Generate the manifests:**
74+
75+
```sh
76+
make manifests
77+
```
78+
79+
3. **Clean up the generated files:**
80+
81+
```sh
82+
make clean
83+
```
84+
85+
4. **Upgrade the kube-prometheus library:**
86+
87+
```sh
88+
make upgrade
89+
```
90+
91+
5. **Check if required tools are installed:**
92+
93+
```sh
94+
make check-tools
95+
```
96+
97+
6. **Display the help message:**
98+
99+
```sh
100+
make help
101+
```
102+
103+
By following these steps, you can manage the kube-prometheus YAML
104+
configurations efficiently using the provided Makefile.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
apiVersion: v1
2+
kind: Namespace
3+
metadata:
4+
labels:
5+
pod-security.kubernetes.io/warn: privileged
6+
pod-security.kubernetes.io/warn-version: latest
7+
name: monitoring
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: PrometheusRule
3+
metadata:
4+
labels:
5+
app.kubernetes.io/component: exporter
6+
app.kubernetes.io/name: kube-prometheus
7+
app.kubernetes.io/part-of: kube-prometheus
8+
prometheus: k8s
9+
role: alert-rules
10+
name: kube-prometheus-rules
11+
namespace: monitoring
12+
spec:
13+
groups:
14+
- name: general.rules
15+
rules:
16+
- alert: TargetDown
17+
annotations:
18+
description: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service }} targets in {{ $labels.namespace }} namespace are down.'
19+
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/targetdown
20+
summary: One or more targets are unreachable.
21+
expr: 100 * (count(up == 0) BY (cluster, job, namespace, service) / count(up) BY (cluster, job, namespace, service)) > 10
22+
for: 10m
23+
labels:
24+
severity: warning
25+
- alert: Watchdog
26+
annotations:
27+
description: |
28+
This is an alert meant to ensure that the entire alerting pipeline is functional.
29+
This alert is always firing, therefore it should always be firing in Alertmanager
30+
and always fire against a receiver. There are integrations with various notification
31+
mechanisms that send a notification when this alert is not firing. For example the
32+
"DeadMansSnitch" integration in PagerDuty.
33+
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/watchdog
34+
summary: An alert that should always be firing to certify that Alertmanager is working properly.
35+
expr: vector(1)
36+
labels:
37+
severity: none
38+
- alert: InfoInhibitor
39+
annotations:
40+
description: |
41+
This is an alert that is used to inhibit info alerts.
42+
By themselves, the info-level alerts are sometimes very noisy, but they are relevant when combined with
43+
other alerts.
44+
This alert fires whenever there's a severity="info" alert, and stops firing when another alert with a
45+
severity of 'warning' or 'critical' starts firing on the same namespace.
46+
This alert should be routed to a null receiver and configured to inhibit alerts with severity="info".
47+
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/infoinhibitor
48+
summary: Info-level alert inhibition.
49+
expr: ALERTS{severity = "info"} == 1 unless on(namespace) ALERTS{alertname != "InfoInhibitor", severity =~ "warning|critical", alertstate="firing"} == 1
50+
labels:
51+
severity: none
52+
- name: node-network
53+
rules:
54+
- alert: NodeNetworkInterfaceFlapping
55+
annotations:
56+
description: Network interface "{{ $labels.device }}" changing its up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}
57+
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/nodenetworkinterfaceflapping
58+
summary: Network interface is often changing its status
59+
expr: |
60+
changes(node_network_up{job="node-exporter",device!~"veth.+"}[2m]) > 2
61+
for: 2m
62+
labels:
63+
severity: warning
64+
- name: kube-prometheus-node-recording.rules
65+
rules:
66+
- expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[3m])) BY (instance)
67+
record: instance:node_cpu:rate:sum
68+
- expr: sum(rate(node_network_receive_bytes_total[3m])) BY (instance)
69+
record: instance:node_network_receive_bytes:rate:sum
70+
- expr: sum(rate(node_network_transmit_bytes_total[3m])) BY (instance)
71+
record: instance:node_network_transmit_bytes:rate:sum
72+
- expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[5m])) WITHOUT (cpu, mode) / ON(instance) GROUP_LEFT() count(sum(node_cpu_seconds_total) BY (instance, cpu)) BY (instance)
73+
record: instance:node_cpu:ratio
74+
- expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[5m]))
75+
record: cluster:node_cpu:sum_rate5m
76+
- expr: cluster:node_cpu:sum_rate5m / count(sum(node_cpu_seconds_total) BY (instance, cpu))
77+
record: cluster:node_cpu:ratio
78+
- name: kube-prometheus-general.rules
79+
rules:
80+
- expr: count without(instance, pod, node) (up == 1)
81+
record: count:up1
82+
- expr: count without(instance, pod, node) (up == 0)
83+
record: count:up0
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: Alertmanager
3+
metadata:
4+
labels:
5+
app.kubernetes.io/component: alert-router
6+
app.kubernetes.io/instance: main
7+
app.kubernetes.io/name: alertmanager
8+
app.kubernetes.io/part-of: kube-prometheus
9+
app.kubernetes.io/version: 0.27.0
10+
name: main
11+
namespace: monitoring
12+
spec:
13+
image: quay.io/prometheus/alertmanager:v0.27.0
14+
nodeSelector:
15+
kubernetes.io/os: linux
16+
podMetadata:
17+
labels:
18+
app.kubernetes.io/component: alert-router
19+
app.kubernetes.io/instance: main
20+
app.kubernetes.io/name: alertmanager
21+
app.kubernetes.io/part-of: kube-prometheus
22+
app.kubernetes.io/version: 0.27.0
23+
replicas: 3
24+
resources:
25+
limits: {}
26+
requests:
27+
cpu: 4m
28+
memory: 100Mi
29+
secrets: []
30+
securityContext:
31+
fsGroup: 2000
32+
runAsNonRoot: true
33+
runAsUser: 1000
34+
serviceAccountName: alertmanager-main
35+
version: 0.27.0
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
apiVersion: networking.k8s.io/v1
2+
kind: NetworkPolicy
3+
metadata:
4+
labels:
5+
app.kubernetes.io/component: alert-router
6+
app.kubernetes.io/instance: main
7+
app.kubernetes.io/name: alertmanager
8+
app.kubernetes.io/part-of: kube-prometheus
9+
app.kubernetes.io/version: 0.27.0
10+
name: alertmanager-main
11+
namespace: monitoring
12+
spec:
13+
egress:
14+
- {}
15+
ingress:
16+
- from:
17+
- podSelector:
18+
matchLabels:
19+
app.kubernetes.io/name: prometheus
20+
ports:
21+
- port: 9093
22+
protocol: TCP
23+
- port: 8080
24+
protocol: TCP
25+
- from:
26+
- podSelector:
27+
matchLabels:
28+
app.kubernetes.io/name: alertmanager
29+
ports:
30+
- port: 9094
31+
protocol: TCP
32+
- port: 9094
33+
protocol: UDP
34+
podSelector:
35+
matchLabels:
36+
app.kubernetes.io/component: alert-router
37+
app.kubernetes.io/instance: main
38+
app.kubernetes.io/name: alertmanager
39+
app.kubernetes.io/part-of: kube-prometheus
40+
policyTypes:
41+
- Egress
42+
- Ingress
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
apiVersion: policy/v1
2+
kind: PodDisruptionBudget
3+
metadata:
4+
labels:
5+
app.kubernetes.io/component: alert-router
6+
app.kubernetes.io/instance: main
7+
app.kubernetes.io/name: alertmanager
8+
app.kubernetes.io/part-of: kube-prometheus
9+
app.kubernetes.io/version: 0.27.0
10+
name: alertmanager-main
11+
namespace: monitoring
12+
spec:
13+
maxUnavailable: 1
14+
selector:
15+
matchLabels:
16+
app.kubernetes.io/component: alert-router
17+
app.kubernetes.io/instance: main
18+
app.kubernetes.io/name: alertmanager
19+
app.kubernetes.io/part-of: kube-prometheus

0 commit comments

Comments
 (0)