Skip to content

Commit eae192a

Browse files
authored
feat(documentation): Add documentation about Prometheus (#1157)
Signed-off-by: Javier Rodriguez <[email protected]>
1 parent 20f4e5f commit eae192a

File tree

4 files changed

+160
-0
lines changed

4 files changed

+160
-0
lines changed
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
---
2+
title: How to monitor your CI/CD systems with Chainloop and Prometheus
3+
---
4+
5+
Chainloop is an open-source Software Supply Chain control plane, a single source of truth for metadata and artifacts, plus a declarative attestation process.
6+
7+
![chainloop-overview](./overview.png)
8+
9+
Using Chainloop, you can effortlessly integrate your CI/CD pipelines or processes by defining a Chainloop [Workflow](../../getting-started/workflow-definition.mdx#workflows). These workflows can include a Chainloop [Contract](../../getting-started/workflow-definition.mdx#workflow-contracts) if desired, based on your requirements. As a result, Chainloop can serve as the central authority for your CI/CD operational health.
10+
11+
## Prometheus Integration
12+
Chainloop integrates with Prometheus, allowing end users to gain insights into their CI/CD pipelines automatically, in an standardized way.
13+
14+
![integration-diagram](./integration-diagram.png)
15+
16+
For the moment, the following metrics are being exposed but, more can be expected:
17+
18+
- `chainloop_workflow_up`: Indicate if the last run was successful.
19+
- `chainloop_workflow_run_duration_seconds`: Duration of a workflow runs in seconds.
20+
21+
Chainloop provides a dedicated endpoint for Prometheus instances to fetch metrics, which varies based on your instance settings. For example:
22+
```bash
23+
https://CHAINLOOP_CONTROLPLANE_URL/prom/ORG_NAME/metrics
24+
```
25+
26+
Where:
27+
- `CHAINLOOP_CONTROLPLANE_URL`: The URL of your Chainloop control plane.
28+
- `ORG_NAME`: The name of the organization from which to gather metrics.
29+
30+
It's important to note that the endpoint is authenticated and can be accessed under two conditions for a given organization:
31+
32+
- Prometheus Integration is activated for the organization.
33+
- A valid API Token is included in the metrics request.
34+
35+
### How to activate Prometheus Integration?
36+
In order to use the Prometheus integration there a few steps that need to be performed:
37+
38+
1. Create or use an existing Chainloop organization.
39+
2. Generate an API Token for the organization.
40+
3. Update Chainloop Controlplane configuration to activate prometheus endpoint for that org.
41+
42+
#### Create or use an existing Chainloop organization
43+
If you already have a Chainloop Organization, you only need to know its name. If you don't have an existing Chainloop Organization, log in and run the following command:
44+
```bash
45+
chainloop organization create --name cyberdyne
46+
```
47+
48+
#### Generate an API Token for the organization
49+
Make sure your current organization is the one you want to create an API Token for and run the following replacing `API_TOKEN_NAME` with your desired API Token name:
50+
```bash
51+
chainloop organization api-token create --name API_TOKEN_NAME
52+
```
53+
Save the output token for later.
54+
55+
#### Update Chainloop Controlplane configuration
56+
When using the Chainloop Open Source Chart, there are a few configurations you can tweak to activate the integration of an existing organization. On your values.yaml add:
57+
```yaml
58+
controlplane:
59+
# existing or previous values.yaml configuration
60+
prometheus_org_metrics:
61+
- org_name: cyberdyne
62+
```
63+
64+
In the example above, we have added the `prometheus_org_metrics` entry to the top level `controlplane` key. The value `org_name: cyberdyne` refers to the fact that we want to activate the metrics for the organization with the name `cyberdyne`.
65+
66+
If you want to activate it for more organizations, simply add them below:
67+
68+
```yaml
69+
controlplane:
70+
# existing or previous values.yaml configuration
71+
prometheus_org_metrics:
72+
- org_name: cyberdyne
73+
- org_name: acme-corp
74+
```
75+
76+
### Test the metrics endpoint
77+
78+
With the generated API Token and changed configuration, we can test that everything works as expected by making a request to the Chainloop’s Controlplane to gather metrics for cyberdyne organization:
79+
```bash
80+
curl -H 'Authorization: Bearer API_TOKEN' \
81+
'https://CHAINLOOP_CONTROLPLANE_URL/prom/cyberdyne/metrics'
82+
```
83+
84+
Which, will return similar metrics in [Prometheus compatible format](https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md):
85+
```text
86+
# HELP chainloop_workflow_up Indicate if the last run was successful.
87+
# TYPE chainloop_workflow_up gauge
88+
chainloop_workflow_up{org="cyberdyne",workflow="backend-release-production"} 1
89+
chainloop_workflow_up{org="cyberdyne",workflow="chainloop-docs-release"} 1
90+
chainloop_workflow_up{org="cyberdyne",workflow="chainloop-labs-tests"} 1
91+
chainloop_workflow_up{org="cyberdyne",workflow="chainloop-platform-deploy"} 0
92+
chainloop_workflow_up{org="cyberdyne",workflow="chainloop-platform-qa-approval"} 1
93+
chainloop_workflow_up{org="cyberdyne",workflow="chainloop-platform-release-canary"} 1
94+
chainloop_workflow_up{org="cyberdyne",workflow="chainloop-platform-release-production"} 1
95+
chainloop_workflow_up{org="cyberdyne",workflow="chainloop-vault-build-and-package"} 1
96+
```
97+
98+
### How to connect to your Prometheus instance?
99+
100+
Depending on how Prometheus is deployed in your infrastructure, there could be several ways to do it, the most common way is by updating the configuration yaml of Prometheus:
101+
```yaml
102+
scrape_configs:
103+
- job_name: chainloop-metrics
104+
metrics_path: /prom/cyberdyne/metrics
105+
scheme: https
106+
bearer_token: CHAINLOOP_API_TOKEN
107+
static_configs:
108+
- targets:
109+
- https://CHAINLOOP_CONTROLPLANE_URL
110+
```
111+
In the previous configuration we can see how we have added a new scrape config called `chainloop-metrics` with several options, the most important ones:
112+
- metrics_path: `/prom/cyberdyne/metrics`
113+
- bearer_token: The Chainloop API Token previously generated
114+
- target: `https://CHAINLOOP_CONTROLPLANE_URL`
115+
116+
With the configuration we will retrieve the metrics for cyberdyne using an authenticated API Token against the installation of Chainloop’s Controlplane.
117+
118+
On another hand you are using the [Prometheus Operator](https://prometheus-operator.dev), you can also leverage almost the same configuration using a [ScrapeConfig](https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1alpha1.ScrapeConfig):
119+
```yaml
120+
apiVersion: monitoring.coreos.com/v1alpha1
121+
kind: ScrapeConfig
122+
metadata:
123+
name: test
124+
spec:
125+
scrapeInterval: 15s
126+
authorization:
127+
type: Bearer
128+
credentials:
129+
name: cyberdyne-metrics-token
130+
key: token
131+
staticConfigs:
132+
- targets:
133+
- https://CHAINLOOP_CONTROLPLANE_URL
134+
metricsPath: /prom/cyberdyne/metrics
135+
```
136+
137+
Where we have created a scrape entry to be added to the global Prometheus configuration with Chainloop’s Controlplane as target and the API Token as secret used for authentication.
138+
139+
## What's next?
140+
Having these metrics is quite powerful, as they enable you to consider further integrations and data visualization methods, such as using Grafana. You can also set up alerts based on these metrics with Alertmanager. Below is an example of visualization using Grafana within a real Chainloop organization.
141+
142+
![grafana-dashboard](./grafana.png)
143+
144+
Here, we can observe how we monitor the latest status of workflows across our CI pipelines and processes.
145+
146+
And in Alertmanager we can set up alerts based on the metrics we have gathered, for example, if a workflow fails, we can send an alert to the responsible team.
147+
```yaml
148+
# previous or existing configuration
149+
groups:
150+
- name: WorkflowDownAlerts
151+
rules:
152+
- alert: WorkflowDown
153+
expr: sum by (org, workflow) (avg_over_time(chainloop_workflow_up[30m])) == 0
154+
for: 30m
155+
labels:
156+
severity: critical
157+
annotations:
158+
summary: "Workflow {{ $labels.workflow }} has been down for at least 30 minutes"
159+
description: "The workflow {{ $labels.workflow }} in organization {{ $labels.org }} has been down for at least 30 minutes."
160+
```
366 KB
Loading
107 KB
Loading
293 KB
Loading

0 commit comments

Comments
 (0)