Skip to content

Commit 9668f91

Browse files
committed
fix: refactor prometheus provisioning
Prometheus was using the clients dir along with metrics-exporter service to provision the scrape targets. Since the directory has been deprecated now the prometheus reads the list of targets directly from the API server using the authenticated /prometheus/targets endpoint. List of targets is updated every 60 seconds. This change also fixes a previously existing bug: obsolete targets were not removed from prometheus target list.
1 parent 27c35c2 commit 9668f91

File tree

12 files changed

+41
-75
lines changed

12 files changed

+41
-75
lines changed

README.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -77,9 +77,7 @@ The module is composed by the following systemd units:
7777
- prometheus.service: runs the prometheus container
7878
- loki.service: runs the loki container
7979
- grafana.service: runs the grafana container
80-
- metrics-exporter.path: watch for vpn connections from vpn.service and start metrics-exporter.service; each time a new client connects, the vpn
8180
container creates a file inside the `prometheus.d/` directory
82-
- metrics-exporter.service: executes the `metrics_exporter_handler` script to create a new prometheus target for the connected machine
8381
- webssh.service: runs the webssh container
8482

8583
### API Server
@@ -114,9 +112,7 @@ Promtail sets the following labels:
114112
[Prometheus](https://prometheus.io/) is a metrics collector, it scrapes metrics from the connected machines. The configuration is available at `/home/nethsecurity-controller1/.config/state/prometheus.yml` and it's generated every time by the `configure-module` action.
115113
It has a the following targets:
116114
- static target with job_name `loki` that scrapes Loki metrics
117-
- dynamic targets with job_name `node` that scrapes metrics from the connected machines from the `prometheus.d/` directory under the state directory (eg. `/home/nethsecurity-controller1/.config/state/prometheus.d`)
118-
119-
Each dynamic target is created by the `metrics-exporter` and has the following labels:
115+
- dynamic targets with job_name `node` that scrapes metrics from the connected machines from the API server `http://<prometheus_user>:<prometheus_pass>@127.0.0.1:<API_PORT/prometheus/targets`
120116

121117
- `instance` the VPN IP of the connected machine with the netdata port (eg. `172.19.64.3:19999`)
122118
- `job` fixed to `node`

imageroot/actions/clone-module/20initialize

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,5 +40,3 @@ EOF
4040
# replace the ports in the db.env file
4141
sed -i "s/^POSTGRES_PORT=.*/POSTGRES_PORT=$db_port/" db.env
4242
sed -i "s|^\(REPORT_DB_URI=postgres://report:[^@]*@127.0.0.1:\)[0-9]\{1,\}|\1$db_port|" db.env
43-
44-
mkdir -p clients

imageroot/actions/configure-module/20configure

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -153,18 +153,6 @@ with open('prometheus.env', 'w') as pfp:
153153
pfp.write(f"PROMETHEUS_PATH={config['prometheus_path']}\n")
154154
pfp.write(f"PROMETHEUS_RETENTION={metrics_retention_days}d\n")
155155

156-
with open('prometheus.yml', 'w', encoding='utf-8') as fp:
157-
fp.write("global:\n")
158-
fp.write("scrape_configs:\n")
159-
fp.write(' - job_name: "node"\n')
160-
fp.write(' file_sd_configs:\n')
161-
fp.write(' - files:\n')
162-
fp.write(' - "/prometheus/prometheus.d/*.yml"\n')
163-
fp.write(' - job_name: "loki"\n')
164-
fp.write(' static_configs:\n')
165-
fp.write(' - targets:\n')
166-
fp.write(f' - 127.0.0.1:{ports[5]}\n')
167-
168156
# Grafana configuration
169157
db = agent.read_envfile('db.env')
170158
with open('grafana.yml', 'w') as fp:

imageroot/actions/create-module/20initialize

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ num=$(echo $MODULE_ID | sed 's/nethsecurity\-controller//')
2626
jwt_secret=$(uuidgen | sha256sum | awk '{print $1}')
2727
reg_secret=$(uuidgen | sha256sum | awk '{print $1}')
2828
db_secret=$(uuidgen | sha256sum | awk '{print $1}')
29+
prometheus_secret=$(uuidgen | sha256sum | awk '{print $1}')
2930
encryption_key=$(uuidgen | sha256sum | awk '{print substr($1,1,32)}')
3031
grafana_postgres_password=$(uuidgen | sha256sum | awk '{print $1}')
3132

@@ -46,6 +47,8 @@ EOF
4647
cat << EOF > secret.env
4748
REGISTRATION_TOKEN=$reg_secret
4849
ENCRYPTION_KEY=$encryption_key
50+
PROMETHEUS_AUTH_USERNAME=prometheus_$RANDOM
51+
PROMETHEUS_AUTH_PASSWORD=$prometheus_secret
4952
EOF
5053

5154
cat << EOF > db.env
@@ -64,5 +67,3 @@ EOF
6467

6568
# This will be compiled inside the configure-module
6669
touch platform.env
67-
68-
mkdir -p clients
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#!/usr/bin/env python3
2+
3+
#
4+
# Copyright (C) 2025 Nethesis S.r.l.
5+
# SPDX-License-Identifier: GPL-3.0-or-later
6+
#
7+
8+
import agent
9+
10+
network = agent.read_envfile('network.env')
11+
api_port = network.get('API_PORT')
12+
13+
secret = agent.read_envfile('secret.env')
14+
prometheus_user = secret.get('PROMETHEUS_AUTH_USERNAME', 'prometheus')
15+
prometheus_pass = secret.get('PROMETHEUS_AUTH_PASSWORD', 'prometheus')
16+
17+
loki = agent.read_envfile('loki.env')
18+
loki_port = loki.get('LOKI_HTTP_PORT')
19+
20+
with open('prometheus.yml', 'w') as f:
21+
f.write('global:\n')
22+
f.write('scrape_configs:\n')
23+
f.write(' - job_name: "node"\n')
24+
f.write(' http_sd_configs:\n')
25+
f.write(f' - url: "http://127.0.0.1:{api_port}/prometheus/targets"\n')
26+
f.write(' refresh_interval: 60s\n')
27+
f.write(' basic_auth:\n')
28+
f.write(f' username: "{prometheus_user}"\n')
29+
f.write(f' password: "{prometheus_pass}"\n')
30+
f.write(' - job_name: "loki"\n')
31+
f.write(' static_configs:\n')
32+
f.write(' - targets:\n')
33+
f.write(f' - "127.0.0.1:{loki_port}"\n')

imageroot/bin/metrics_exporter_handler

Lines changed: 0 additions & 32 deletions
This file was deleted.

imageroot/systemd/user/api.service

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
[Unit]
33
Description=Podman api.service
44
BindsTo=controller.service
5-
After=controller.service vpn.service
5+
After=controller.service vpn.service timescale.service
66

77
[Service]
88
Environment=PODMAN_SYSTEMD_UNIT=%n

imageroot/systemd/user/controller.service

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[Unit]
22
Description=Podman controller.service
3-
Requires=vpn.service api.service ui.service proxy.service promtail.service metrics-exporter.path loki.service prometheus.service grafana.service webssh.service timescale.service
4-
Before=vpn.service api.service ui.service proxy.service promtail.service metrics-exporter.path loki.service prometheus.service grafana.service webssh.service timescale.service
3+
Requires=vpn.service api.service ui.service proxy.service promtail.service loki.service prometheus.service grafana.service webssh.service timescale.service
4+
Before=vpn.service api.service ui.service proxy.service promtail.service loki.service prometheus.service grafana.service webssh.service timescale.service
55
ConditionPathExists=%S/state/environment
66
ConditionPathExists=%S/state/network.env
77

imageroot/systemd/user/metrics-exporter.path

Lines changed: 0 additions & 8 deletions
This file was deleted.

imageroot/systemd/user/metrics-exporter.service

Lines changed: 0 additions & 9 deletions
This file was deleted.

0 commit comments

Comments
 (0)