Skip to content

Commit d282ea4

Browse files
authored
Merge pull request #73 from CircleCI-Public/ONPREM-2156/add-tempo-metrics
Expose Tempo metrics to monitor Tempo RED metrics along with CPU/Memo…
1 parent d0a440e commit d282ea4

File tree

8 files changed

+656
-9
lines changed

8 files changed

+656
-9
lines changed

Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
apiVersion: v2
22
name: server-monitoring-stack
33
description: A reference Helm chart for setting up a monitoring stack for CircleCI server
4-
version: 0.1.0-alpha.8
4+
version: 0.1.0-alpha.9
55
dependencies:
66
- name: prometheus-operator-crds
77
version: 19.0.0

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ This repository is currently under active development and is not yet a supported
1616

1717
A reference Helm chart for setting up a monitoring stack for CircleCI server
1818

19-
![Version: 0.1.0-alpha.8](https://img.shields.io/badge/Version-0.1.0--alpha.8-informational?style=flat-square)
19+
![Version: 0.1.0-alpha.9](https://img.shields.io/badge/Version-0.1.0--alpha.9-informational?style=flat-square)
2020

2121
## Installing the Monitoring Stack
2222

@@ -58,7 +58,7 @@ Before installing the full chart, you must first install the dependency subchart
5858
Install the Prometheus Custom Resource Definitions (CRDs) and the Grafana operator chart. This assumes you are installing it in the same namespace as your CircleCI server installation:
5959

6060
```bash
61-
$ helm install server-monitoring-stack server-monitoring-stack/server-monitoring-stack --set global.enabled=false --set prometheusOperator.installCRDs=true --version 0.1.0-alpha.8 -n <your-server-namespace>
61+
$ helm install server-monitoring-stack server-monitoring-stack/server-monitoring-stack --set global.enabled=false --set prometheusOperator.installCRDs=true --version 0.1.0-alpha.9 -n <your-server-namespace>
6262
```
6363

6464
> **_NOTE:_** It's possible to install the monitoring stack in a different namespace than the CircleCI server installation. If you do so, set the `prometheus.serviceMonitor.selectorNamespaces` value with the target namespace.
@@ -95,7 +95,7 @@ $ kubectl wait --for=condition=available --timeout=120s deployment/tempo-operato
9595
Next, install the Helm chart using the following command:
9696

9797
```bash
98-
$ helm upgrade --install server-monitoring-stack server-monitoring-stack/server-monitoring-stack --reset-values --version 0.1.0-alpha.8 -n <your-server-namespace>
98+
$ helm upgrade --install server-monitoring-stack server-monitoring-stack/server-monitoring-stack --reset-values --version 0.1.0-alpha.9 -n <your-server-namespace>
9999
```
100100

101101
### 5. Verify Prometheus Is Up and Targeting Telegraf
@@ -243,7 +243,8 @@ Dashboards are provisioned directly from CRDs, which means any manual edits will
243243
| prometheus.serviceMonitor.endpoints[0].port | string | `"prometheus-client"` | Port name for the Prometheus client service. |
244244
| prometheus.serviceMonitor.endpoints[0].relabelings[0].action | string | `"labeldrop"` | |
245245
| prometheus.serviceMonitor.endpoints[0].relabelings[0].regex | string | `"(container|endpoint|namespace|pod|service)"` | |
246-
| prometheus.serviceMonitor.selectorLabels | object | `{"app.kubernetes.io/instance":"circleci-server","app.kubernetes.io/name":"telegraf"}` | Labels to select ServiceMonitors for scraping metrics. By default, it's configured to scrape the existing Telegraf deployment in CircleCI server. |
246+
| prometheus.serviceMonitor.selectorExpressions | list | `[{"key":"app.kubernetes.io/name","operator":"In","values":["telegraf","tempo"]}]` | Match ServiceMonitors with specific names |
247+
| prometheus.serviceMonitor.selectorLabels | object | `{"app.kubernetes.io/instance":"circleci-server"}` | Labels to select ServiceMonitors for scraping metrics. By default, it's configured to scrape the existing Telegraf and Tempo deployments in CircleCI server. |
247248
| prometheus.serviceMonitor.selectorNamespaces | list | `[]` | Namespaces to look for ServiceMonitor objects. Set this if the CircleCI server monitoring stack is deploying in a different namespace than the actual CircleCI server installation. |
248249
| prometheusOperator.crds.annotations."helm.sh/resource-policy" | string | `"keep"` | |
249250
| prometheusOperator.enabled | string | `"-"` | |
@@ -265,6 +266,9 @@ Dashboards are provisioned directly from CRDs, which means any manual edits will
265266
| tempo.resources.limits.memory | string | `"2Gi"` | Maximum memory Tempo pods can use |
266267
| tempo.resources.requests.cpu | string | `"500m"` | Minimum CPU guaranteed to Tempo pods |
267268
| tempo.resources.requests.memory | string | `"1Gi"` | Minimum memory guaranteed to Tempo pods |
269+
| tempo.serviceMonitor | object | `{"enabled":true,"endpoints":[{"interval":"30s","path":"/metrics","port":"http"}]}` | Exposes Tempo RED metrics for Prometheus |
270+
| tempo.serviceMonitor.enabled | bool | `true` | Enable ServiceMonitor creation for Tempo metrics |
271+
| tempo.serviceMonitor.endpoints | list | `[{"interval":"30s","path":"/metrics","port":"http"}]` | Endpoints configuration for metrics scraping |
268272
| tempo.storage | object | `{"traces":{"backend":"memory","size":"20Gi","storageClassName":""}}` | Storage configuration for trace data |
269273
| tempo.storage.traces.backend | string | `"memory"` | Storage backend for traces Default: in-memory storage (traces lost on pod restart) Suitable for development/testing environments only |
270274
| tempo.storage.traces.size | string | `"20Gi"` | Storage volume size For memory/pv: actual volume size For cloud backends: size of WAL (Write-Ahead Log) volume Increase for higher trace volumes or longer retention |

dashboards/.lint

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,16 @@ exclusions:
33
reason: "The individual panels have a datasource."
44
entries:
55
- dashboard: Server SLIs
6+
- dashboard: Tempo Monitoring
67
panel-datasource-rule:
78
reason: "'${DS_PROMETHEUS}' is a valid datasource."
89
entries:
910
- dashboard: Server SLIs
11+
- dashboard: Tempo Monitoring
1012
panel-title-description-rule:
1113
reason: "Ideally each panel should have a description, but right now that's not the case."
1214
uneditable-dashboard:
1315
reason: "The dashboard needs to be editable in order to make copies of it."
1416
entries:
1517
- dashboard: Server SLIs
18+
- dashboard: Tempo Monitoring

0 commit comments

Comments
 (0)