Skip to content

Conversation

@sed-i
Copy link
Contributor

@sed-i sed-i commented Oct 23, 2025

Note: The first commit in this PR was created using copilot.

This PR is a PoC for #108.

Fixes #98.
Fixes #108.

In tandem with:

Testing

Deploy the bundle:

default-base: [email protected]/stable
applications:
  ho:
    charm: hardware-observer
    channel: latest/stable
    revision: 515
  oc:
    charm: ./[email protected]
  ub22:
    charm: ubuntu
    channel: latest/stable
    num_units: 1
    to:
    - "0"
  zk:
    charm: zookeeper
    channel: 3/stable
    num_units: 1
    to:
    - "0"
machines:
  "0": {}
relations:
- - oc:juju-info
  - ub22:juju-info
- - oc:cos-agent
  - zk:cos-agent
- - ho:general-info
  - ub22:juju-info
- - ho:cos-agent
  - oc:cos-agent
$ jssh ub22/0 ls /var/snap/node-exporter/common/textfile-collector.d/
oc_0.prom  oc_1.prom

$ jssh ub22/0 cat /var/snap/node-exporter/common/textfile-collector.d/oc_0.prom
# HELP subordinate_charm_info An info metric for correlating between principal charms and the corresponding node-exporter metrics.
# TYPE subordinate_charm_info gauge
subordinate_charm_info{subordinate_unit="oc/0",principal_unit="ho/0",juju_model="ne",juju_model_uuid="70e27e10-acff-4cf2-8709-73fb83a771f4"} 1
subordinate_charm_info{subordinate_unit="oc/0",principal_unit="ub22/0",juju_model="ne",juju_model_uuid="70e27e10-acff-4cf2-8709-73fb83a771f4"} 1

$ jssh ub22/0 cat /var/snap/node-exporter/common/textfile-collector.d/oc_1.prom
# HELP subordinate_charm_info An info metric for correlating between principal charms and the corresponding node-exporter metrics.
# TYPE subordinate_charm_info gauge
subordinate_charm_info{subordinate_unit="oc/1",principal_unit="zk/0",juju_model="ne",juju_model_uuid="70e27e10-acff-4cf2-8709-73fb83a771f4"} 1

$ jssh ub22/0 curl localhost:9100/metrics | grep subordinate_charm_info
# HELP subordinate_charm_info An info metric for correlating between principal charms and the corresponding node-exporter metrics.
# TYPE subordinate_charm_info gauge
subordinate_charm_info{juju_model="ne",juju_model_uuid="70e27e10-acff-4cf2-8709-73fb83a771f4",principal_unit="ho/0",subordinate_unit="oc/0"} 1
subordinate_charm_info{juju_model="ne",juju_model_uuid="70e27e10-acff-4cf2-8709-73fb83a771f4",principal_unit="ub22/0",subordinate_unit="oc/0"} 1
subordinate_charm_info{juju_model="ne",juju_model_uuid="70e27e10-acff-4cf2-8709-73fb83a771f4",principal_unit="zk/0",subordinate_unit="oc/1"} 1
❯ jst           
Model  Controller  Cloud/Region         Version  SLA          Timestamp
ne     lxd         localhost/localhost  3.6.8    unsupported  10:12:44-04:00

App   Version  Status   Scale  Charm                    Channel        Rev  Exposed  Message
ho             active       1  hardware-observer        latest/stable  515  no       Unit is ready
oc    0.130.0  blocked      2  opentelemetry-collector                   0  no       ['cloud-config']|...
ub22  22.04    active       1  ubuntu                   latest/stable   26  no       
zk    3.9.2    active       1  zookeeper                3/stable       158  no       

Unit     Workload  Agent  Machine  Public address  Ports  Message
ub22/0*  active    idle   0        10.181.49.220          
  ho/0*  active    idle            10.181.49.220          Unit is ready
  oc/0*  blocked   idle            10.181.49.220          ['cloud-config']|...
zk/0*    active    idle   0        10.181.49.220          
  oc/1   blocked   idle            10.181.49.220          ['cloud-config']|...

Machine  State    Address        Inst id        Base          AZ  Message
0        started  10.181.49.220  juju-a771f4-0  [email protected]      Running

Integration provider  Requirer         Interface        Type         Message
ho:cos-agent          oc:cos-agent     cos_agent        subordinate  
oc:peers              oc:peers         otelcol_replica  peer         
ub22:juju-info        ho:general-info  juju-info        subordinate  
ub22:juju-info        oc:juju-info     juju-info        subordinate  
zk:cluster            zk:cluster       cluster          peer         
zk:cos-agent          oc:cos-agent     cos_agent        subordinate  
zk:restart            zk:restart       rolling_op       peer         
zk:upgrade            zk:upgrade       upgrade          peer

Context

Since node-exporter is not necessarily installed when constants.py is parsed, the value for $SNAP_COMMON is hard-coded rather than obtained dynamically.

Copy link
Contributor

@Abuelodelanada Abuelodelanada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Some minor comments added!

Comment on lines 557 to 590
def _write_node_exporter_textfile(self) -> None:
"""Write a per-unit metrics file for node-exporter's textfile collector.

The file will contain one or more `subordinate_charm_info` metrics with
labels identifying the subordinate (this unit) and the principal unit,
plus juju topology labels.
"""
topology = JujuTopology.from_charm(self)

# Gather principal units from cos-agent and juju-info relations
principals = set()
for rel_name in ("cos-agent", "juju-info"):
for rel in self.model.relations.get(rel_name, []):
for unit in rel.units:
principals.add(unit.name)

# Build metric lines
lines = []
for principal in sorted(principals):
labels = {
"subordinate_unit": self.unit.name,
"principal_unit": principal,
"juju_model": topology.model,
"juju_model_uuid": topology.model_uuid,
}
# Format labels as key="value"
label_str = ",".join(f'{k}="{v}"' for k, v in labels.items())
lines.append(f"subordinate_charm_info{{{label_str}}} 1")

LocalPath(NODE_EXPORTER_TEXTFILE_DIR).mkdir(parents=True, exist_ok=True)
# Filename must match the glob `*.prom` used by node-exporter's textfile collector
# Ref: https://github.com/prometheus/node_exporter/tree/master?tab=readme-ov-file#textfile-collector
LocalPath(os.path.join(NODE_EXPORTER_TEXTFILE_DIR, f'{self.unit.name.replace("/", "_")}.prom')).write_text("\n".join(lines) + ("\n" if lines else ""))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some separations of concerns:

  1. The topology object is already created in line 151, I would remove this responsibility from the method.

  2. I would extract the responsibility of generating the set of principal's unit names to another method.

Suggested change
def _write_node_exporter_textfile(self) -> None:
"""Write a per-unit metrics file for node-exporter's textfile collector.
The file will contain one or more `subordinate_charm_info` metrics with
labels identifying the subordinate (this unit) and the principal unit,
plus juju topology labels.
"""
topology = JujuTopology.from_charm(self)
# Gather principal units from cos-agent and juju-info relations
principals = set()
for rel_name in ("cos-agent", "juju-info"):
for rel in self.model.relations.get(rel_name, []):
for unit in rel.units:
principals.add(unit.name)
# Build metric lines
lines = []
for principal in sorted(principals):
labels = {
"subordinate_unit": self.unit.name,
"principal_unit": principal,
"juju_model": topology.model,
"juju_model_uuid": topology.model_uuid,
}
# Format labels as key="value"
label_str = ",".join(f'{k}="{v}"' for k, v in labels.items())
lines.append(f"subordinate_charm_info{{{label_str}}} 1")
LocalPath(NODE_EXPORTER_TEXTFILE_DIR).mkdir(parents=True, exist_ok=True)
# Filename must match the glob `*.prom` used by node-exporter's textfile collector
# Ref: https://github.com/prometheus/node_exporter/tree/master?tab=readme-ov-file#textfile-collector
LocalPath(os.path.join(NODE_EXPORTER_TEXTFILE_DIR, f'{self.unit.name.replace("/", "_")}.prom')).write_text("\n".join(lines) + ("\n" if lines else ""))
def _get_principal_units(self) -> set:
# Gather principal units from cos-agent and juju-info relations
principals = set()
for rel_name in ("cos-agent", "juju-info"):
for rel in self.model.relations.get(rel_name, []):
for unit in rel.units:
principals.add(unit.name)
return sorted(principals)
def _write_node_exporter_textfile(self, topology: JujuTopology) -> None:
"""Write a per-unit metrics file for node-exporter's textfile collector.
The file will contain one or more `subordinate_charm_info` metrics with
labels identifying the subordinate (this unit) and the principal unit,
plus juju topology labels.
"""
# Gather principal units from cos-agent and juju-info relations
principals = self._get_principal_units()
# Build metric lines
lines = []
for principal in principals:
labels = {
"subordinate_unit": self.unit.name,
"principal_unit": principal,
"juju_model": topology.model,
"juju_model_uuid": topology.model_uuid,
}
# Format labels as key="value"
label_str = ",".join(f'{k}="{v}"' for k, v in labels.items())
lines.append(f"subordinate_charm_info{{{label_str}}} 1")
LocalPath(NODE_EXPORTER_TEXTFILE_DIR).mkdir(parents=True, exist_ok=True)
# Filename must match the glob `*.prom` used by node-exporter's textfile collector
# Ref: https://github.com/prometheus/node_exporter/tree/master?tab=readme-ov-file#textfile-collector
LocalPath(os.path.join(NODE_EXPORTER_TEXTFILE_DIR, f'{self.unit.name.replace("/", "_")}.prom')).write_text("\n".join(lines) + ("\n" if lines else ""))

return

# Update node-exporter textfile collector with info about subordinate relations
self._write_node_exporter_textfile()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._write_node_exporter_textfile()
self._write_node_exporter_textfile(topology)

@sed-i
Copy link
Contributor Author

sed-i commented Nov 6, 2025

Need to demo for @marcusboden with alerts for principal charm that include the charm name (not app).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add info metric per principal charm Duplicate node_exporter metrics for every principal

4 participants