Add 'switch' label containing hostname#36
Conversation
Add 'switch' label containing hostname
|
This is already possible with record rules: Most of the metrics are counters so can use record rules to define the rate used and pull in switch name and other info. |
|
It is not ideal for fabrics with more than one instance running the exporter (e.g. SM nodes): Error executing query found duplicate series for the match group {guid="0xb8001db1da", port="1"} on the right hand-side of the operation: [{__name__="infiniband_switch_uplink_info", datacenter="dc", guid="0xb8001db1da", instance="opensm1:9315", job="infiniband", port="1",
{__name__="infiniband_switch_uplink_info", datacenter="dc", guid="0xb8001db1da", instance="opensm2:9315", job="infiniband", port="1"];
many-to-many matching not allowed: matching labels must be unique on one sideAlso running the join operation for a fabric big enough creates unnecessary resource strain on Prometheus, especially with this amount of metrics and recording rules for each. |
|
We have UFM HA but only the primary server runs this exporter, the secondary doesn't get scraped and Prometheus scrapes follow the VIP that UFM uses to identify the primary. That's obviously not possible or as easy with plain OpenSM but might be worth trying as you end up with duplicate metrics if multiple targets are scraped with essentially the same data. If you are creating dashboards or alerts where you end up doing rates over and over again, that may create more strain on Prometheus than storing the record rules and searching just the record rule. That's generally why record rules exist, to reduce the load on Prometheus during searches. For the record rules you'd have to add |
|
Also not sure on your scale but we have 81 switches with around 48 ports per switch and we've had little issues with the record rule and we have multiple kinds of rates for each metric and even record rules that consume other record rules. Prometheus shows our record rules take a little under 2 seconds to be generated on each interval which is 60 seconds. |
Adds a
switchlabel containing device’s hostname (or whatever ibnetdiscover reports) to every switch metric improving readability.The label stays present even if
--ibnetdiscover.node-name-mapwas not provided.