Skip to content

Commit 9adff9b

Browse files
authored
add node exporter daemonset (#194)
1 parent 319c5bd commit 9adff9b

13 files changed

+551
-1
lines changed

TEST.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,19 @@ helm lint charts/sourcegraph/.
1212

1313
## Unit testing
1414

15-
We utilize [helm-unittest], a BDD styled unit test framework, to validate our helm chart.
15+
We utilize [helm-unittest](https://github.com/quintush/helm-unittest/), a BDD styled unit test framework, to validate our helm chart.
16+
17+
helm-unittest can be installed with:
18+
19+
```bash
20+
helm plugin install https://github.com/quintush/helm-unittest
21+
```
22+
23+
Once the plugin is installed, you can run the unit tests using the following:
24+
25+
```bash
26+
helm unittest --helm3 ./charts/sourcegraph/.
27+
```
1628

1729
We currently do not have testing best practices or require unit tests for new changes, so add test cases at your best judgement if possible.
1830

charts/sourcegraph/CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,14 @@ Use `**BREAKING**:` to denote a breaking change
1313
Sourcegraph 4.1.0 is now available!
1414
- [Changelog](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/CHANGELOG.md#4-1-0)
1515
- Added `allowedTopologies` support to storageclass [#188](https://github.com/sourcegraph/deploy-sourcegraph-helm/pull/188). This is useful to restrict provisioning of PV in specific zones or regions. In some cloud providers (e.g. GCP), this can be used to provision regional disks with only one worker node present.
16+
- Added a node-exporter daemonset, which collects crucial machine-level metrics that help Sourcegraph scale your deployment. See [#194](https://github.com/sourcegraph/deploy-sourcegraph-helm/pull/194) for more information
17+
18+
🚨 **WARNING**: Similarly to cadvisor, `node-exporter`:
19+
- runs as a daemonset
20+
- needs to mount various read-only directories from the host machine (`/`, `/proc`, and `/sys`)
21+
- ideally shares the machine's PID and Network namespaces
22+
23+
If necessary, node-exporter can be disabled by setting `nodeExporter.enabled: false` in your `override.yaml` configuration file.
1624

1725
## 4.0.1
1826

charts/sourcegraph/README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,19 @@ In addition to the documented values, all services also support the following va
181181
| minio.serviceAccount.create | bool | `false` | Enable creation of ServiceAccount for `minio` |
182182
| minio.serviceAccount.name | string | `""` | Name of the ServiceAccount to be created or an existing ServiceAccount |
183183
| minio.storageSize | string | `"100Gi"` | PVC Storage Request for `minio` data volume |
184+
| nodeExporter.containerSecurityContext | object | `{"allowPrivilegeEscalation":false,"readOnlyRootFilesystem":true,"runAsGroup":65534,"runAsUser":65534}` | Security context for the `node-exporter` container, learn more from the [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-container) |
185+
| nodeExporter.enabled | bool | `true` | Enable `node-exporter` |
186+
| nodeExporter.extraArgs | list | `[]` | |
187+
| nodeExporter.hostNetwork | bool | `true` | |
188+
| nodeExporter.hostPID | bool | `true` | |
189+
| nodeExporter.image.defaultTag | string | `"179720_2022-10-25_4d925e87cfb8@sha256:2d9dcdf0b2226f0c3d550a64d2667710265462350a3ba9ebe37d0302bc64af0f"` | Docker image tag for the `node-exporter` image |
190+
| nodeExporter.image.name | string | `"node-exporter"` | Docker image name for the `node-exporter` image |
191+
| nodeExporter.name | string | `"node-exporter"` | Name used by resources. Does not affect service names or PVCs. |
192+
| nodeExporter.podSecurityContext | object | `{"fsGroup":65534,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534}` | Security context for the `node-exporter` pod, learn more from the [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod) |
193+
| nodeExporter.podSecurityPolicy.enabled | bool | `false` | Enable [PodSecurityPolicy](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) for `node-exporter` pods |
194+
| nodeExporter.resources | object | `{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":".2","memory":"100Mi"}}` | Resource requests & limits for the `node-exporter` container, learn more from the [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) |
195+
| nodeExporter.serviceAccount.create | bool | `true` | Enable creation of ServiceAccount for `node-exporter` |
196+
| nodeExporter.serviceAccount.name | string | `"node-exporter"` | Name of the ServiceAccount to be created or an existing ServiceAccount |
184197
| openTelemetry.agent.name | string | `"otel-agent"` | Name used by resources. Does not affect service names or PVCs. |
185198
| openTelemetry.agent.resources | object | `{"limits":{"cpu":"500m","memory":"500Mi"},"requests":{"cpu":"100m","memory":"100Mi"}}` | Resource requests & limits for the `otel-agent` container, learn more from the [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) |
186199
| openTelemetry.enabled | bool | `true` | |

charts/sourcegraph/templates/NOTES.txt

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,38 @@ Such metric provides critical information to help you scale the Sourcegraph depl
2828
If you would like to bring your own infrastructure monitoring & alerting solution,
2929
you may want to disable the `cadvisor` DaemonSet completely by setting `cadvisor.enabled=false` in your override file.
3030

31+
{{- end }}
32+
33+
{{- if not .Values.nodeExporter.enabled }}
34+
35+
🚧 Warning 🚧
36+
37+
You have set 'nodeExporter.enabled' to 'false', which completely disables node exporter. Node exporter provides
38+
critical machine-level metrics that help you scale your Sourcegraph deployments. Without node-exporter, you might have
39+
to rely on the (possibility limited) tooling that your cloud provider provides to have insight into your machines.
40+
41+
{{- end }}
42+
43+
{{- if not .Values.nodeExporter.hostPID }}
44+
45+
🚧 Warning 🚧
46+
47+
You have set 'nodeExporter.hostPID' to 'false' which greatly limits the metrics that node-exporter is able to provide. Many of the
48+
metrics that Sourcegraph uses to help you scale your deployment might be broken as a result.
49+
50+
If you would like to bring your own infrastructure monitoring & alerting solution,
51+
you may want to disable the `node-exporter` DaemonSet completely by setting `nodeExporter.enabled=false` in your override file.
52+
53+
{{- end }}
54+
55+
{{- if not .Values.nodeExporter.hostNetwork }}
56+
57+
🚧 Warning 🚧
58+
59+
You have set 'nodeExporter.hostNetwork' to 'false' which greatly limits the metrics that node-exporter is able to provide. Many of the
60+
metrics that Sourcegraph uses to help you scale your deployment might be broken as a result.
61+
62+
If you would like to bring your own infrastructure monitoring & alerting solution,
63+
you may want to disable the `node-exporter` DaemonSet completely by setting `nodeExporter.enabled=false` in your override file.
64+
3165
{{- end }}
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{{- if .Values.nodeExporter.enabled -}}
2+
kind: ClusterRole
3+
apiVersion: rbac.authorization.k8s.io/v1
4+
metadata:
5+
labels:
6+
app: node-exporter
7+
category: rbac
8+
deploy: sourcegraph
9+
app.kubernetes.io/component: node-exporter
10+
name: {{ .Values.nodeExporter.name }}
11+
rules:
12+
- apiGroups: ['policy']
13+
resources: ['podsecuritypolicies']
14+
verbs: ['use']
15+
resourceNames:
16+
- {{ .Values.nodeExporter.name }}
17+
{{- end }}
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{{- if .Values.nodeExporter.enabled -}}
2+
kind: ClusterRoleBinding
3+
apiVersion: rbac.authorization.k8s.io/v1
4+
metadata:
5+
labels:
6+
app: node-exporter
7+
category: rbac
8+
deploy: sourcegraph
9+
app.kubernetes.io/component: node-exporter
10+
name: {{ .Values.nodeExporter.name }}
11+
roleRef:
12+
apiGroup: rbac.authorization.k8s.io
13+
kind: ClusterRole
14+
name: {{ .Values.nodeExporter.name }}
15+
subjects:
16+
- kind: ServiceAccount
17+
name: {{ include "sourcegraph.serviceAccountName" (list . "nodeExporter") }}
18+
namespace: {{ .Release.Namespace }}
19+
{{- end }}
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
{{- if .Values.nodeExporter.enabled -}}
2+
apiVersion: apps/v1
3+
kind: DaemonSet
4+
metadata:
5+
annotations:
6+
description: DaemonSet to ensure all nodes run a node-exporter pod.
7+
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
8+
labels:
9+
{{- include "sourcegraph.labels" . | nindent 4 }}
10+
{{- if .Values.nodeExporter.labels }}
11+
{{- toYaml .Values.nodeExporter.labels | nindent 4 }}
12+
{{- end }}
13+
deploy: sourcegraph
14+
app: node-exporter
15+
app.kubernetes.io/component: node-exporter
16+
name: {{ .Values.nodeExporter.name }}
17+
spec:
18+
selector:
19+
matchLabels:
20+
{{- include "sourcegraph.selectorLabels" . | nindent 6 }}
21+
app: node-exporter
22+
template:
23+
metadata:
24+
annotations:
25+
description: Collects and exports machine metrics.
26+
kubectl.kubernetes.io/default-container: node-exporter
27+
{{- if .Values.sourcegraph.podAnnotations }}
28+
{{- toYaml .Values.sourcegraph.podAnnotations | nindent 8 }}
29+
{{- end }}
30+
{{- if .Values.nodeExporter.podAnnotations }}
31+
{{- toYaml .Values.nodeExporter.podAnnotations | nindent 8 }}
32+
{{- end }}
33+
labels:
34+
{{- include "sourcegraph.selectorLabels" . | nindent 8 }}
35+
{{- if .Values.sourcegraph.podLabels }}
36+
{{- toYaml .Values.sourcegraph.podLabels | nindent 8 }}
37+
{{- end }}
38+
{{- if .Values.nodeExporter.podLabels }}
39+
{{- toYaml .Values.nodeExporter.podLabels | nindent 8 }}
40+
{{- end }}
41+
deploy: sourcegraph
42+
app: node-exporter
43+
spec:
44+
{{- include "sourcegraph.renderServiceAccountName" (list . "nodeExporter") | trim | nindent 6 }}
45+
containers:
46+
- name: node-exporter
47+
image: {{ include "sourcegraph.image" (list . "nodeExporter" ) }}
48+
imagePullPolicy: {{ .Values.sourcegraph.image.pullPolicy }}
49+
args:
50+
- --web.listen-address=:9100
51+
- --path.sysfs=/host/sys
52+
- --path.rootfs=/host/root
53+
- --path.procfs=/host/proc
54+
- --no-collector.wifi
55+
- --no-collector.hwmon
56+
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
57+
- --collector.netclass.ignored-devices=^(veth.*)$
58+
- --collector.netdev.device-exclude=^(veth.*)$
59+
{{- if .Values.nodeExporter.extraArgs }}
60+
{{ toYaml .Values.nodeExporter.extraArgs | indent 10 }}
61+
{{- end }}
62+
env:
63+
{{- range $name, $item := .Values.nodeExporter.env}}
64+
- name: {{ $name }}
65+
{{- $item | toYaml | nindent 10 }}
66+
{{- end }}
67+
{{- if not .Values.sourcegraph.localDevMode }}
68+
resources:
69+
{{- toYaml .Values.nodeExporter.resources | nindent 10 }}
70+
{{- end }}
71+
securityContext:
72+
{{- toYaml .Values.nodeExporter.containerSecurityContext | nindent 10 }}
73+
volumeMounts:
74+
- name: rootfs
75+
mountPath: /host/root
76+
mountPropagation: HostToContainer
77+
readOnly: true
78+
- name: sys
79+
mountPath: /host/sys
80+
mountPropagation: HostToContainer
81+
readOnly: true
82+
- name: proc
83+
mountPath: /host/proc
84+
mountPropagation: HostToContainer
85+
readOnly: true
86+
{{- if .Values.nodeExporter.extraVolumeMounts }}
87+
{{- toYaml .Values.nodeExporter.extraVolumeMounts | nindent 8 }}
88+
{{- end }}
89+
ports:
90+
- name: metrics
91+
containerPort: 9100
92+
protocol: TCP
93+
readinessProbe:
94+
failureThreshold: 3
95+
httpGet:
96+
scheme: HTTP
97+
port: metrics
98+
initialDelaySeconds: 0
99+
periodSeconds: 10
100+
successThreshold: 1
101+
timeoutSeconds: 1
102+
livenessProbe:
103+
failureThreshold: 3
104+
httpGet:
105+
scheme: HTTP
106+
port: metrics
107+
initialDelaySeconds: 0
108+
periodSeconds: 10
109+
successThreshold: 1
110+
timeoutSeconds: 1
111+
terminationMessagePolicy: FallbackToLogsOnError
112+
automountServiceAccountToken: false
113+
terminationGracePeriodSeconds: 30
114+
{{- if .Values.nodeExporter.extraContainers }}
115+
{{- toYaml .Values.nodeExporter.extraContainers | nindent 6 }}
116+
{{- end }}
117+
securityContext:
118+
{{- toYaml .Values.nodeExporter.podSecurityContext | nindent 8 }}
119+
{{- include "sourcegraph.nodeSelector" (list . "nodeExporter" ) | trim | nindent 6 }}
120+
{{- include "sourcegraph.affinity" (list . "nodeExporter" ) | trim | nindent 6 }}
121+
{{- include "sourcegraph.tolerations" (list . "nodeExporter" ) | trim | nindent 6 }}
122+
{{- with .Values.sourcegraph.imagePullSecrets }}
123+
imagePullSecrets:
124+
{{- toYaml . | nindent 8 }}
125+
{{- end }}
126+
hostNetwork: {{ .Values.nodeExporter.hostNetwork }}
127+
hostPID: {{ .Values.nodeExporter.hostPID }}
128+
volumes:
129+
- name: rootfs
130+
hostPath:
131+
path: /
132+
- name: sys
133+
hostPath:
134+
path: /sys
135+
- name: proc
136+
hostPath:
137+
path: /proc
138+
{{- if .Values.nodeExporter.extraVolumes }}
139+
{{- toYaml .Values.nodeExporter.extraVolumes | nindent 6 }}
140+
{{- end }}
141+
{{- end }}
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{{- if and .Values.nodeExporter.enabled .Values.nodeExporter.podSecurityPolicy.enabled -}}
2+
apiVersion: policy/v1beta1
3+
kind: PodSecurityPolicy
4+
metadata:
5+
labels:
6+
app: node-exporter
7+
deploy: sourcegraph
8+
app.kubernetes.io/component: node-exporter
9+
name: {{ .Values.nodeExporter.name }}
10+
spec:
11+
privileged: false
12+
hostIPC: false
13+
hostNetwork: {{ .Values.nodeExporter.hostNetwork }}
14+
hostPID: {{ .Values.nodeExporter.hostPID }}
15+
seLinux:
16+
rule: RunAsAny
17+
supplementalGroups:
18+
rule: RunAsAny
19+
runAsUser:
20+
rule: RunAsAny
21+
fsGroup:
22+
rule: RunAsAny
23+
volumes:
24+
- '*'
25+
allowedHostPaths:
26+
- pathPrefix: "/"
27+
- pathPrefix: "/sys"
28+
- pathPrefix: "/proc"
29+
readOnlyRootFilesystem: true
30+
{{- end }}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
{{- if .Values.nodeExporter.enabled -}}
2+
apiVersion: v1
3+
kind: Service
4+
metadata:
5+
annotations:
6+
description: Prometheus exporter for hardware and OS metrics.
7+
url: https://github.com/prometheus/node_exporter
8+
prometheus.io/port: "9100"
9+
sourcegraph.prometheus/scrape: "true"
10+
{{- if .Values.nodeExporter.serviceAnnotations }}
11+
{{- toYaml .Values.nodeExporter.serviceAnnotations | nindent 4 }}
12+
{{- end }}
13+
labels:
14+
app.kubernetes.io/component: node-exporter
15+
app: node-exporter
16+
deploy: sourcegraph
17+
sourcegraph-resource-requires: no-cluster-admin
18+
{{- if .Values.nodeExporter.serviceLabels }}
19+
{{- toYaml .Values.nodeExporter.serviceLabels | nindent 4 }}
20+
{{- end }}
21+
name: node-exporter
22+
spec:
23+
ports:
24+
- name: metrics
25+
port: 9100
26+
targetPort: metrics
27+
selector:
28+
{{- include "sourcegraph.selectorLabels" . | nindent 4 }}
29+
app: node-exporter
30+
type: {{ .Values.nodeExporter.serviceType | default "ClusterIP" }}
31+
{{- end }}
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{{- if and .Values.nodeExporter.enabled .Values.nodeExporter.serviceAccount.create -}}
2+
apiVersion: v1
3+
kind: ServiceAccount
4+
metadata:
5+
labels:
6+
app: node-exporter
7+
category: rbac
8+
deploy: sourcegraph
9+
app.kubernetes.io/component: node-exporter
10+
{{- include "sourcegraph.serviceAccountAnnotations" (list . "nodeExporter") | trim | nindent 2 }}
11+
name: {{ include "sourcegraph.serviceAccountName" (list . "nodeExporter") }}
12+
{{- end }}

0 commit comments

Comments
 (0)