Skip to content

Commit 7fc7f9e

Browse files
authored
Merge pull request #1041 from stackhpc/feature/2023.1/friendly-interface-names
[prometheus] Rename network interfaces to network names
2 parents 98cb4f4 + 705ebf5 commit 7fc7f9e

File tree

4 files changed

+398
-0
lines changed

4 files changed

+398
-0
lines changed

doc/source/configuration/monitoring.rst

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,3 +169,69 @@ If you notice ``HaproxyServerDown`` or ``HaproxyBackendDown`` prometheus
169169
alerts after deployment it's likely the os_exporter secrets have not been
170170
set correctly, double check you have entered the correct authentication
171171
information appropiate to your cloud and re-deploy.
172+
173+
Friendly Network Names
174+
=======================
175+
For operators that prefer to see descriptive or friendly interface names the
176+
following play can be run. This takes network names as defined in kayobe and
177+
relabels the devices/interfaces in Prometheus to make use of these names.
178+
179+
**Check considerations and known limitations to see if this is suitable in any
180+
given environment before applying.**
181+
182+
This reuses existing fields to provide good compatibility with existing
183+
dashboards and alerts.
184+
185+
To enable the change:
186+
187+
.. code-block:: console
188+
189+
kayobe playbook run etc/kayobe/ansible/prometheus-network-names.yml
190+
kayobe overcloud service reconfigure --kt prometheus
191+
192+
This first generates a template based on the prometheus.yml.j2
193+
``etc/kayobe/ansible/`` and which is further templated for use with
194+
kolla-ansible.
195+
This is then rolled out via service reconfigure.
196+
197+
198+
This helps Prometheus provide insights that can be more easily understood by
199+
those without an intimate understanding of a given site. Prometheus Node
200+
Exporter and cAdvisor both provide network statistics using the
201+
interface/device names. This play causes Prometheus to relabel these fields to
202+
human readable names based on the networks as defined in kayobe
203+
e.g. bond1.1838 may become storage_network.
204+
205+
The default labels are preserved with the prefix ``original_``.
206+
207+
* For node_exporter, ``device`` is then used for network names, while
208+
``original_device`` is used for the interface itself.
209+
* For cAdvisor, ``interface`` is used for network names, and
210+
``original_interface`` is used to preserve the interface name.
211+
212+
:Known-Limitations/Considerations/Requirements:
213+
214+
Before enabling this feature, the implications must be discussed with the
215+
customer. The following are key considerations for that conversation:
216+
217+
* Only network names defined within kayobe are within scope.
218+
* Tenant network interfaces, including SR-IOV are not considered or modified.
219+
* Only the interface directly attributed to a network will be relabelled.
220+
This may be a bond, a vlan tagged sub-interface, or both.
221+
The parent bond, or bond members are not relabelled unless they are
222+
captured within a distinct defined network.
223+
* Modified entries will be within existing labels. This may be breaking for
224+
anything that expects the original structure, including custom dashboards,
225+
alerting, billing, etc.
226+
* After applying, there will be inconsistency in the time-series db for the
227+
duration of the retention period i.e until previously ingested entries
228+
expire.
229+
The metrics gathered prior to applying these modifications will be unaltered,
230+
with all new metrics using the new structure.
231+
* The interface names and their purpose must be consistent and unique within
232+
the environment. i.e if eth0 is defined as admin_interface on one node, no
233+
other node can include a different network definition using eth0.
234+
This does not apply in the case when both devices are bond members.
235+
e.g. bond0 on a controller has eth0 and eth1 as members. bond1 on a compute
236+
uses eth0 and eth1 as members. This is not problematic as it is only
237+
the bond itself that is relabelled.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
- name: Prometheus friendly network names
2+
hosts: overcloud
3+
gather_facts: no
4+
tasks:
5+
- name: Gather network maps from each host with unique identifiers
6+
set_fact:
7+
host_network_maps: >-
8+
{%- set if_list = [] -%}
9+
{%- for i in network_interfaces -%}
10+
{%- set device_name = hostvars[inventory_hostname][i ~ '_interface'] -%}
11+
{%- set friendly_name = i -%}
12+
{%- set unique_id = device_name ~ '_' ~ friendly_name -%}
13+
{%- set _ = if_list.append({
14+
'unique_id': unique_id,
15+
'device_name': device_name,
16+
'friendly_name': friendly_name
17+
}) -%}
18+
{%- endfor -%}
19+
{{ if_list }}
20+
21+
- name: Aggregate network maps from all hosts
22+
set_fact:
23+
prometheus_network_maps_aggregated: "{{ groups['overcloud'] | map('extract', hostvars, 'host_network_maps') | flatten }}"
24+
run_once: true
25+
delegate_to: localhost
26+
27+
- name: Deduplicate the aggregated list based on unique IDs
28+
set_fact:
29+
prometheus_network_maps_blob: "{{ prometheus_network_maps_aggregated | unique(attribute='unique_id') }}"
30+
run_once: true
31+
delegate_to: localhost
32+
33+
- name: Apply template with deduplicated network maps
34+
ansible.builtin.template:
35+
src: prometheus.yml.j2
36+
dest: "{{ kayobe_env_config_path }}/kolla/config/prometheus/prometheus.yml"
37+
become: true
38+
run_once: true
39+
delegate_to: localhost
Lines changed: 287 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,287 @@
1+
{{ '{%' }} raw {{ '%}' }}
2+
{% raw %}
3+
global:
4+
scrape_interval: {{ prometheus_scrape_interval }}
5+
scrape_timeout: 10s
6+
evaluation_interval: 15s
7+
{% if prometheus_external_labels %}
8+
external_labels:
9+
{% for label, value in prometheus_external_labels.items() %}
10+
{{ label }}: {{ value }}
11+
{% endfor %}
12+
{% endif %}
13+
14+
{% if prometheus_alert_rules.files is defined and prometheus_alert_rules.files | length > 0 %}
15+
rule_files:
16+
{% for rule in prometheus_alert_rules.files %}
17+
- "/etc/prometheus/{{ rule.path | basename }}"
18+
{% endfor %}
19+
{% endif %}
20+
21+
scrape_configs:
22+
- job_name: prometheus
23+
basic_auth:
24+
username: admin
25+
password: "{{ prometheus_password }}"
26+
static_configs:
27+
{% for host in groups['prometheus'] %}
28+
- targets:
29+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ prometheus_port }}'
30+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
31+
labels:
32+
instance: "{{ hostvars[host].prometheus_instance_label }}"
33+
{% endif %}
34+
{% endfor %}
35+
36+
{% if enable_prometheus_node_exporter | bool %}
37+
- job_name: node
38+
static_configs:
39+
{% for host in groups['prometheus-node-exporter'] %}
40+
- targets:
41+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_node_exporter_port'] }}'
42+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
43+
labels:
44+
instance: "{{ hostvars[host].prometheus_instance_label }}"
45+
{% endif %}
46+
{% endfor %}
47+
{% endraw %}
48+
metric_relabel_configs:
49+
- replacement: $1
50+
source_labels: [device]
51+
target_label: 'original_device'
52+
{% for net_map in prometheus_network_maps_blob %}
53+
- source_labels: [__name__, device]
54+
regex: '(.*);(.*{{ net_map.device_name }}$)'
55+
target_label: 'device'
56+
replacement: '{{ net_map.friendly_name }}.$2'
57+
{% endfor %}
58+
{% raw %}
59+
{% endif %}
60+
61+
{% if enable_prometheus_mysqld_exporter | bool %}
62+
- job_name: mysqld
63+
static_configs:
64+
{% for host in groups['prometheus-mysqld-exporter'] %}
65+
- targets:
66+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_mysqld_exporter_port'] }}'
67+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
68+
labels:
69+
instance: "{{ hostvars[host].prometheus_instance_label }}"
70+
{% endif %}
71+
{% endfor %}
72+
{% endif %}
73+
74+
{% if enable_prometheus_haproxy_exporter | bool %}
75+
- job_name: haproxy
76+
static_configs:
77+
{% for host in groups['loadbalancer'] %}
78+
- targets:
79+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ prometheus_haproxy_exporter_port }}'
80+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
81+
labels:
82+
instance: "{{ hostvars[host].prometheus_instance_label }}"
83+
{% endif %}
84+
{% endfor %}
85+
{% endif %}
86+
87+
{% if enable_prometheus_rabbitmq_exporter | bool %}
88+
- job_name: rabbitmq
89+
static_configs:
90+
{% for host in groups['rabbitmq'] %}
91+
- targets:
92+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_rabbitmq_exporter_port'] }}'
93+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
94+
labels:
95+
instance: "{{ hostvars[host].prometheus_instance_label }}"
96+
{% endif %}
97+
{% endfor %}
98+
{% endif %}
99+
100+
{% if enable_prometheus_memcached_exporter | bool %}
101+
- job_name: memcached
102+
static_configs:
103+
{% for host in groups['prometheus-memcached-exporter'] %}
104+
- targets:
105+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_memcached_exporter_port'] }}'
106+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
107+
labels:
108+
instance: "{{ hostvars[host].prometheus_instance_label }}"
109+
{% endif %}
110+
{% endfor %}
111+
{% endif %}
112+
113+
{% if enable_prometheus_cadvisor | bool %}
114+
- job_name: cadvisor
115+
static_configs:
116+
{% for host in groups["prometheus-cadvisor"] %}
117+
- targets:
118+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_cadvisor_port'] }}'
119+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
120+
labels:
121+
instance: "{{ hostvars[host].prometheus_instance_label }}"
122+
{% endif %}
123+
{% endfor %}
124+
{% endraw %}
125+
metric_relabel_configs:
126+
- replacement: $1
127+
source_labels: [interface]
128+
target_label: 'original_interface'
129+
{% for net_map in prometheus_network_maps_blob %}
130+
- source_labels: [__name__, image, interface]
131+
regex: '(.*);(.*{{ net_map.device_name }}$)'
132+
target_label: 'interface'
133+
replacement: '{{ net_map.friendly_name }}'
134+
{% endfor %}
135+
{% raw %}
136+
{% endif %}
137+
138+
{% if enable_prometheus_fluentd_integration | bool %}
139+
- job_name: fluentd
140+
static_configs:
141+
{% for host in groups['fluentd'] %}
142+
- targets:
143+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_fluentd_integration_port'] }}'
144+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
145+
labels:
146+
instance: "{{ hostvars[host].prometheus_instance_label }}"
147+
{% endif %}
148+
{% endfor %}
149+
{% endif %}
150+
151+
{% if enable_prometheus_ceph_mgr_exporter | bool %}
152+
- job_name: ceph_mgr_exporter
153+
honor_labels: true
154+
scrape_interval: {{ prometheus_ceph_exporter_interval }}
155+
static_configs:
156+
- targets:
157+
{% for exporter in prometheus_ceph_mgr_exporter_endpoints %}
158+
- '{{ exporter }}'
159+
{% endfor %}
160+
{% endif %}
161+
162+
{% if enable_prometheus_openstack_exporter | bool %}
163+
- job_name: openstack_exporter
164+
scrape_interval: {{ prometheus_openstack_exporter_interval }}
165+
scrape_timeout: {{ prometheus_openstack_exporter_timeout }}
166+
{% if kolla_enable_tls_internal | bool %}
167+
scheme: https
168+
{% endif %}
169+
honor_labels: true
170+
static_configs:
171+
- targets:
172+
- '{{ kolla_internal_fqdn | put_address_in_context('url') }}:{{ prometheus_openstack_exporter_port }}'
173+
{% endif %}
174+
175+
{% if enable_prometheus_elasticsearch_exporter | bool %}
176+
- job_name: elasticsearch_exporter
177+
scrape_interval: {{ prometheus_elasticsearch_exporter_interval }}
178+
static_configs:
179+
{% for host in groups["prometheus-elasticsearch-exporter"] %}
180+
- targets:
181+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_elasticsearch_exporter_port'] }}'
182+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
183+
labels:
184+
instance: "{{ hostvars[host].prometheus_instance_label }}"
185+
{% endif %}
186+
{% endfor %}
187+
{% endif %}
188+
189+
{% if enable_prometheus_blackbox_exporter | bool and prometheus_blackbox_exporter_endpoints | length > 0 | bool %}
190+
- job_name: blackbox_exporter
191+
metrics_path: /probe
192+
honor_labels: true
193+
static_configs:
194+
- targets:
195+
{% for target in prometheus_blackbox_exporter_endpoints %}
196+
- '{{ target }}'
197+
{% endfor %}
198+
relabel_configs:
199+
- source_labels: [__address__]
200+
regex: (\w+):(\w+):(.+)
201+
target_label: service
202+
replacement: ${1}
203+
- source_labels: [__address__]
204+
regex: (\w+):(\w+):(.+)
205+
target_label: __param_module
206+
replacement: ${2}
207+
- source_labels: [__param_module]
208+
target_label: module
209+
- source_labels: [__address__]
210+
regex: (\w+):(\w+):(.+)
211+
target_label: __param_target
212+
replacement: ${3}
213+
- source_labels: [__param_target]
214+
target_label: instance
215+
- target_label: __address__
216+
replacement: '{{ api_interface_address | put_address_in_context('url') }}:{{ prometheus_blackbox_exporter_port }}'
217+
{% endif %}
218+
219+
{% if enable_prometheus_libvirt_exporter | bool %}
220+
- job_name: libvirt_exporter
221+
scrape_interval: {{ prometheus_libvirt_exporter_interval }}
222+
honor_labels: true
223+
static_configs:
224+
{% for host in groups["prometheus-libvirt-exporter"] %}
225+
- targets:
226+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_libvirt_exporter_port'] }}'
227+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
228+
labels:
229+
instance: "{{ hostvars[host].prometheus_instance_label }}"
230+
{% endif %}
231+
{% endfor %}
232+
{% endif %}
233+
234+
{% if enable_prometheus_etcd_integration | bool %}
235+
- job_name: etcd
236+
{% if etcd_enable_tls | bool %}
237+
scheme: https
238+
{% endif %}
239+
static_configs:
240+
{% for host in groups["etcd"] %}
241+
- targets:
242+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_etcd_integration_port'] }}'
243+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
244+
labels:
245+
instance: "{{ hostvars[host].prometheus_instance_label }}"
246+
{% endif %}
247+
{% endfor %}
248+
{% endif %}
249+
250+
{% if enable_ironic_prometheus_exporter | bool %}
251+
- job_name: ironic_prometheus_exporter
252+
static_configs:
253+
{% for host in groups['ironic-conductor'] %}
254+
- targets: ["{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['ironic_prometheus_exporter_port'] }}"]
255+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
256+
labels:
257+
instance: "{{ hostvars[host].prometheus_instance_label }}"
258+
{% endif %}
259+
{% endfor %}
260+
{% endif %}
261+
262+
{% if enable_prometheus_alertmanager | bool %}
263+
- job_name: alertmanager
264+
static_configs:
265+
{% for host in groups['prometheus-alertmanager'] %}
266+
- targets:
267+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_alertmanager_port'] }}'
268+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
269+
labels:
270+
instance: "{{ hostvars[host].prometheus_instance_label }}"
271+
{% endif %}
272+
{% endfor %}
273+
274+
alerting:
275+
alertmanagers:
276+
- static_configs:
277+
- targets:
278+
{% for host in groups["prometheus-alertmanager"] %}
279+
- '{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ hostvars[host]['prometheus_alertmanager_port'] }}'
280+
{% if hostvars[host].prometheus_instance_label | default(false, true) %}
281+
labels:
282+
instance: "{{ hostvars[host].prometheus_instance_label }}"
283+
{% endif %}
284+
{% endfor %}
285+
{% endif %}
286+
{% endraw %}
287+
{{ '{%' }} endraw {{ '%}' }}

0 commit comments

Comments
 (0)