Skip to content

Commit eb39c03

Browse files
authored
Add Prometheus configMap for k3s (#659)
Linear issue: [REL-809: Investigate why Grafana is not showing Prometheus data](https://linear.app/sourcegraph/issue/REL-809/investigate-why-grafana-is-not-showing-prometheus-data) ### Checklist - [x] Follow the [manual testing process](https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/main/TEST.md) - [ ] Update [changelog](https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/main/charts/sourcegraph/CHANGELOG.md) - [ ] Update [Kubernetes update doc](https://docs.sourcegraph.com/admin/updates/kubernetes) ### Test plan Tested with a customer self-hosting on k3s, and on our AMI running k3s <!-- As part of SOC2/GN-104 and SOC2/GN-105 requirements, all pull requests are REQUIRED to provide a "test plan". A test plan is a loose explanation of what you have done or implemented to test this, as outlined in our Testing principles and guidelines: https://docs.sourcegraph.com/dev/background-information/testing_principles Write your test plan here after the "Test plan" header. -->
1 parent 68c976f commit eb39c03

File tree

2 files changed

+220
-0
lines changed

2 files changed

+220
-0
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Prometheus ConfigMap Override
2+
3+
## Why
4+
5+
- Some self-hosted customers run their instances on non-standard Kubernetes clusters, such as k3s, which expose metrics using different names / labels
6+
- Our Grafana dashboards expect metrics to be on our Prometheus container with specific names
7+
- Using the default configMap, the Grafana graphs do not show some metrics, although they may exist on Prometheus
8+
- Use this configMap to rename k3s' metrics to match our Grafana dashboard queries
9+
10+
## How to Use
11+
12+
- Apply the override configMap via `kubectl apply -f prometheus-override-k3s.ConfigMap.yaml`
13+
- Add the new configMap's name in your Helm values override file, ex:
14+
```yaml
15+
prometheus:
16+
existingConfig: prometheus-override-k3s
17+
```
18+
- Re-apply your Helm values override file, which may restart the Prometheus pod, but should not restart other services
19+
20+
## Notes
21+
22+
- Copied from https://github.com/sourcegraph/deploy/blob/main/install/prometheus-override.ConfigMap.yaml
23+
- If this situation (matching symptoms and root cause) is found with other types of Kubernetes clusters, new Prometheus override configMaps could be created
24+
25+
## Troubleshooting Empty Grafana Dashboards
26+
27+
- There are a handful of steps in the metrics pipeline where data could be getting lost:
28+
- Are the cAdvisor, node-exporter, Prometheus, and Grafana containers all running, and healthy?
29+
- Are any of these pods reporting any issues in their Kubernetes events, or container logs?
30+
- Is network connectivity open from Prometheus to each of the cAdvisor / node-exporter containers?
31+
- Is network connectivity open from Grafana to Prometheus?
32+
- Does Prometheus have access to Kubernetes RBAC roles to use Service Discovery to find the IP addresses of cAdvisor and node-exporter pods?
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
apiVersion: v1
2+
kind: ConfigMap
3+
metadata:
4+
labels:
5+
deploy: sourcegraph
6+
name: prometheus-override-k3s
7+
data:
8+
node_rules.yml: ''
9+
extra_rules.yml: ''
10+
prometheus.yml: |
11+
global:
12+
scrape_interval: 30s
13+
evaluation_interval: 30s
14+
15+
alerting:
16+
alertmanagers:
17+
- static_configs:
18+
- targets: ['127.0.0.1:9093']
19+
path_prefix: /alertmanager
20+
21+
rule_files:
22+
- '*_rules.yml'
23+
- "/sg_config_prometheus/*_rules.yml"
24+
- "/sg_prometheus_add_ons/*_rules.yml"
25+
26+
scrape_configs:
27+
28+
- job_name: 'kubernetes-apiservers'
29+
kubernetes_sd_configs:
30+
- role: endpoints
31+
scheme: https
32+
tls_config:
33+
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
34+
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
35+
relabel_configs:
36+
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
37+
action: keep
38+
regex: default;kubernetes;https
39+
40+
- job_name: 'kubernetes-nodes'
41+
scheme: https
42+
tls_config:
43+
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
44+
insecure_skip_verify: true
45+
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
46+
kubernetes_sd_configs:
47+
- role: node
48+
relabel_configs:
49+
- action: labelmap
50+
regex: __meta_kubernetes_node_label_(.+)
51+
- target_label: __address__
52+
replacement: kubernetes.default.svc:443
53+
54+
############################################################################################################
55+
# k3s and cAdvisor-specific customization
56+
# name container metrics after their container name labels
57+
# Note that 'io.kubernetes.container.name' and 'io.kubernetes.pod.name' must be provided in cAdvisor
58+
############################################################################################################
59+
- source_labels: [__meta_kubernetes_node_name]
60+
regex: (.+)
61+
target_label: __metrics_path__
62+
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
63+
metric_relabel_configs:
64+
- source_labels: [container, pod]
65+
regex: (.+)
66+
action: replace
67+
target_label: name
68+
separator: '-'
69+
- source_labels: [container]
70+
regex: (.+)
71+
action: replace
72+
target_label: container_label_io_kubernetes_container_name
73+
- source_labels: [pod]
74+
regex: (.+)
75+
action: replace
76+
target_label: container_label_io_kubernetes_pod_name
77+
############################################################################################################
78+
79+
- job_name: 'kubernetes-service-endpoints'
80+
kubernetes_sd_configs:
81+
- role: endpoints
82+
relabel_configs:
83+
- source_labels: [__meta_kubernetes_service_annotation_sourcegraph_prometheus_scrape]
84+
action: keep
85+
regex: true
86+
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
87+
action: replace
88+
target_label: __scheme__
89+
regex: (https?)
90+
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
91+
action: replace
92+
target_label: __metrics_path__
93+
regex: (.+)
94+
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
95+
action: replace
96+
target_label: __address__
97+
regex: (.+)(?::\d+);(\d+)
98+
replacement: $1:$2
99+
- action: labelmap
100+
regex: __meta_kubernetes_service_label_(.+)
101+
- source_labels: [__meta_kubernetes_namespace]
102+
action: replace
103+
target_label: ns
104+
- source_labels: [__meta_kubernetes_service_name]
105+
action: replace
106+
target_label: kubernetes_name
107+
# Sourcegraph specific customization. We want a nicer name for job
108+
- source_labels: [app]
109+
action: replace
110+
target_label: job
111+
# Sourcegraph specific customization. We want a nicer name for instance
112+
- source_labels: [__meta_kubernetes_pod_name]
113+
action: replace
114+
target_label: instance
115+
116+
- job_name: 'kubernetes-services'
117+
metrics_path: /probe
118+
params:
119+
module: [http_2xx]
120+
kubernetes_sd_configs:
121+
- role: service
122+
relabel_configs:
123+
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
124+
action: keep
125+
regex: true
126+
- source_labels: [__address__]
127+
target_label: __param_target
128+
- target_label: __address__
129+
replacement: blackbox
130+
- source_labels: [__param_target]
131+
target_label: instance
132+
- action: labelmap
133+
regex: __meta_kubernetes_service_label_(.+)
134+
- source_labels: [__meta_kubernetes_service_namespace]
135+
target_label: ns
136+
- source_labels: [__meta_kubernetes_service_name]
137+
target_label: kubernetes_name
138+
139+
- job_name: 'kubernetes-pods'
140+
kubernetes_sd_configs:
141+
- role: pod
142+
relabel_configs:
143+
# Sourcegraph specific customization, only scrape pods with our annotation
144+
- source_labels: [__meta_kubernetes_pod_annotation_sourcegraph_prometheus_scrape]
145+
action: keep
146+
regex: true
147+
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
148+
action: replace
149+
target_label: __metrics_path__
150+
regex: (.+)
151+
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
152+
action: replace
153+
regex: (.+):(?:\d+);(\d+)
154+
replacement: ${1}:${2}
155+
target_label: __address__
156+
- action: labelmap
157+
regex: __meta_kubernetes_pod_label_(.+)
158+
- source_labels: [__meta_kubernetes_pod_name]
159+
action: replace
160+
target_label: kubernetes_pod_name
161+
162+
############################################################################################################
163+
# k3s and cAdvisor-specific customization
164+
############################################################################################################
165+
- source_labels: [namespace]
166+
action: replace
167+
target_label: ns
168+
metric_relabel_configs:
169+
- source_labels: [kubernetes_io_hostname]
170+
regex: sourcegraph-0
171+
action: keep
172+
- source_labels: [namespace]
173+
regex: default
174+
action: keep
175+
############################################################################################################
176+
177+
# Scrape prometheus itself for metrics.
178+
- job_name: 'builtin-prometheus'
179+
static_configs:
180+
- targets: ['127.0.0.1:9092']
181+
labels:
182+
app: prometheus
183+
- job_name: 'builtin-alertmanager'
184+
metrics_path: /alertmanager/metrics
185+
static_configs:
186+
- targets: ['127.0.0.1:9093']
187+
labels:
188+
app: alertmanager

0 commit comments

Comments
 (0)