prometheus-node-exporter-ucode: new modules and extended info by systemcrash · Pull Request #28016 · openwrt/packages

systemcrash · 2025-12-05T00:09:30Z

-thermal management (temp and cooling devices)
-uname amendment to oneline and help string
-realtek-poe
-odhcp6c
-entropy help strings
-time help string
-hwmon
-time clocksources
-watchdog
-metrics help strings

📦 Package Details

Maintainer: @dhewg (?) @feckert

🧪 Run Testing Details

OpenWrt Version:24.10.4 / 25 snapshot

✅ Formalities

I have reviewed the CONTRIBUTING.md file for detailed contributing guidelines.

dhewg · 2025-12-05T08:20:09Z

Nice, generally this looks good to me! (But I haven't yet checked the metric themselves.) Thanks for the `uname` replacement! I knew my solution was lame (I even added a comment about it), but I wonder why I didn't use your way to begin with... The watchdog collector needs to be added to the Makefile. Some of the next comments may sound nitpicking, but let's optimize each collector for a minimal runtime. Using many on slow devices in a tight polling interval really adds up: - hwmon and realtek_poe add and just call scrape(), please fold the function content out to the main scope to get rid of it - don't use snprintf() like expression if not necessary (like realtek_poe's METRIC_NAMESPACE, I'd just fold it into each usage) - creating metrics of the same name should always happen once in the main scope, and not in a loop (time, watchdog) - watchdog looks like it could use some cleanup, can the `const metric = { ...}` logic applied here? To keep the code simple, it uses a metric map to loop oneline() over it If you didn't know: you can run selected collectors from the cmdline and check the `node_scrape_collector_duration_seconds` metric for the runtime: e.g.: `prometheus-node-exporter-ucode netclass` I did that alot to get each collectors runtime down back then ;)

systemcrash · 2025-12-05T14:48:24Z

Wow, micro-optimisations! Yeah, I ran reset ; prometheus-node-exporter-ucode thermal alot. Very helpful aspect :)

creating metrics of the same name should always happen once in the main scope, and not in a loop (time, watchdog)

So we get ordered results like this? (IIRC this accelerates ingestion?):

# HELP node_cooling_device_max_state Maximum throttle state of the cooling device
# TYPE node_cooling_device_max_state gauge
node_cooling_device_max_state{name="0",type="Processor"} 0
node_cooling_device_max_state{name="1",type="Processor"} 0
node_cooling_device_max_state{name="10",type="Processor"} 0
...
node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="0",mode="user"} 0
node_cpu_guest_seconds_total{cpu="1",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="1",mode="user"} 0
node_cpu_guest_seconds_total{cpu="10",mode="nice"} 0
node_cpu_guest_seconds_total{cpu="10",mode="user"} 0
...

Using many on slow devices in a tight polling interval really adds up:

Maniacs crushing their routers with 5 seconds of polling :D

The watchdog collector needs to be added to the Makefile.

It's in base, so it's installed automatically. Should it be an extra?

@GeorgeSapkin this commit length problem is not good. See how many characters one has to work with on this module?? :/

GeorgeSapkin · 2025-12-05T16:49:30Z

@systemcrash like I said before, I'm not married to the exact number. That one was the only thing in the official guidelines, so that's what I went with. If more people feel differently we can change the limit to something else, which I do think is important to have. We don't need to look far for whole commit messages being one long line.

Edit: this also comes back to blocking build if formal fails. Which I'm still on the fence about.

systemcrash · 2025-12-05T17:45:36Z

@systemcrash like I said before, I'm not married to the exact number. That one was the only thing in the official guidelines, so that's what I went with. If more people feel differently we can change the limit to something else, which I do think is important to have. We don't need to look far for whole commit messages being one long line.

Edit: this also comes back to blocking build if formal fails. Which I'm still on the fence about.

I'm OK with a limit at the established 72 characters. But I don't see why that should be a hard-limit. A warning perhaps. A hard-limit at like 120-150 likely flags that there're no newlines used anywhere.

systemcrash · 2025-12-05T17:45:59Z

@dhewg if you could take another look? I optimised netdev and netclass for ordered output.

systemcrash · 2025-12-09T01:40:36Z

pinging @dhewg - think I've addressed everything here.

dhewg · 2025-12-09T08:17:18Z

creating metrics of the same name should always happen once in the main scope, and not in a loop (time, watchdog)

So we get ordered results like this? (IIRC this accelerates ingestion?):

Sorry, I just meant:

const m = metric(...);
for (...) {
    m();
}

instead of:

for (...) {
    const m = metric(...);
    m();
}

I appreciate the optimizations, but is the ordered output really worth it? Like in terms of time spend by the prometheus daemon to parse the output? It makes the collectors less readable, especially netdev.

The watchdog collector needs to be added to the Makefile.

It's in base, so it's installed automatically. Should it be an extra?

Let's make it an extra. This is what I get one one of my devices:
node_watchdog_info{name="watchdog0",options="null",identity="null",state="null",status="null",pretimeout_governor="null"} 1
So It'll be just baggage there.

On another device I get:
node_watchdog_info{name="watchdog0",options="0x8180",identity="GPIO Watchdog",state="active",status="0x8000",pretimeout_governor="null"} 1
It's been a while, but those null values don't look right, those should probably checked for?

For hwmon, Is this the expected chip?
node_hwmon_chip_names{chip="thermal_thermal_zone0",chip_name="cpu_thermal"} 1

That's from:

ls -l /sys/class/hwmon/hwmon0/
lrwxrwxrwx    1 root     root             0 Dec  9 09:02 device -> ../../thermal_zone0
-r--r--r--    1 root     root          4096 Dec  9 09:02 name
drwxr-xr-x    2 root     root             0 Dec  9 09:07 power
lrwxrwxrwx    1 root     root             0 Dec  9 09:07 subsystem -> ../../../../../class/hwmon
-r--r--r--    1 root     root          4096 Dec  9 09:02 temp1_input
-rw-r--r--    1 root     root          4096 Dec  9 09:07 uevent

systemcrash · 2025-12-09T12:20:13Z

For hwmon, Is this the expected chip? node_hwmon_chip_names{chip="thermal_thermal_zone0",chip_name="cpu_thermal"} 1

Yes, that looks correct. It pulls the chip_name from:

ls -l /sys/class/hwmon/hwmon0/
...
-r--r--r--    1 root     root          4096 Dec  9 09:02 name

Can you check that file content?

Here's what I get from node-exporter v1.9 on an i3:

# HELP node_hwmon_chip_names Annotation metric for human-readable chip names
# TYPE node_hwmon_chip_names gauge
node_hwmon_chip_names{chip="0000:00:1c_4_0000:03:00_0",chip_name="i350bb"} 1
node_hwmon_chip_names{chip="platform_coretemp_0",chip_name="coretemp"} 1
node_hwmon_chip_names{chip="thermal_thermal_zone0",chip_name="acpitz"} 1

vs:

ls -l /sys/class/hwmon/hwmon0/
total 0
lrwxrwxrwx 1 root root    0 Dec  9 13:17 device -> ../../thermal_zone0
-r--r--r-- 1 root root 4096 Dec  9 13:17 name
drwxr-xr-x 2 root root    0 Dec  9 13:17 power
lrwxrwxrwx 1 root root    0 Dec  9 13:17 subsystem -> ../../../../../class/hwmon
-r--r--r-- 1 root root 4096 Dec  9 13:17 temp1_input
-r--r--r-- 1 root root 4096 Dec  9 13:17 temp2_input
-rw-r--r-- 1 root root 4096 Dec  9 13:17 uevent
# cat /sys/class/hwmon/hwmon0/name
acpitz

is the ordered output really worth it? Like in terms of time spend by the prometheus daemon to parse the output? It makes the collectors less readable, especially netdev.

It makes output less jumbled and more readable (since HELP strings occur with the TYPE, they're grouped together and that type appears once, rather than only appearing the first time in multiple appearances of that row type later on), and it contributes to lower data-cardinality at ingestion, which IIRC helps coalesce writes.

I think it makes the code more readable - since the loops are distinct and not nested.

systemcrash · 2025-12-09T13:29:12Z

OK. It's an extra now. Much appreciated if you could take hwmon for another spin @dhewg

GeorgeSapkin · 2025-12-09T23:42:35Z

utils/prometheus-node-exporter-ucode/files/base/netclass.uc

-		let line = oneline(devroot + m);
-		metrics[m]({ device }, line);
+for (let m in metrics) {
+	for (let i = 0; i < dev_length; i++) {


Does ucode not support for..of?

No. But were we to do it that way, it creates a new object on every iteration, whereas the classic loop does not. Aiming for speed, here :)

utils/prometheus-node-exporter-ucode/files/base/netclass.uc

GeorgeSapkin · 2025-12-11T14:15:05Z

utils/prometheus-node-exporter-ucode/files/base/uname.uc

 	release:	oneline("/proc/sys/kernel/osrelease"),
 	version:	oneline("/proc/sys/kernel/version"),
-	machine:	poneline("uname -m"), // TODO lame
+	machine:	oneline("/proc/sys/kernel/arch"),


Still tabs here.

I prefer not to modify how it was already there. I know you want consistency...

(they all align whatever the width anyway)

GeorgeSapkin · 2025-12-11T14:15:57Z

utils/prometheus-node-exporter-ucode/files/extra/watchdog.uc

+		identity:	iden,
+		state:		stat,
+		status:		stus,
+		pretimeout_governor: prtg,


Tabs here as well.

(they all align whatever the width anyway)

systemcrash · 2025-12-15T14:11:19Z

ping @dhewg for final test run

dhewg · 2025-12-19T10:17:52Z

Sorry for the delay, pre-xmas stress...

Well, I agree to not modify the whitespaces!
But then please revert the whitespace changes in netclass/netdev, because that is now inconsistent.
I'm not married to whatever I used initially, but let's not bikeshed this please.

is the ordered output really worth it? Like in terms of time spend by the prometheus daemon to parse the output?

I think it makes the code more readable - since the loops are distinct and not nested.

Sorry, still disagreeing. While it's not rocket science, it's just more code to read. In netdev we now have 4 loops instead of one. You even added comments to make it clear what it does. It wasn't unclear before at all.
It's similar for most of the new files.

So again: Is this reduced readability really worth it?
Don't get me wrong, it may as well be! I just don't know and don't have the time to dig in atm.
But on the other side I can't see a prometheus daemon really sucking that much if not every collector on earth is optimized in this specific way?

As for hwmon/watchdog, the null values are gone, they're empty string now.
Remind me, is that what prometheus wants? Or should we just not output empty labels?

systemcrash · 2025-12-19T15:03:48Z

Don't get me wrong, it may as well be! I just don't know and don't have the time to dig in atm.

(my) Reasoning being that we output ordered, rather than revisiting the same metrics for different devices. It makes stare and compare in the scrape output clearer, among other downstream benefits.

As for hwmon/watchdog, the null values are gone, they're empty string now.
Remind me, is that what prometheus wants? Or should we just not output empty labels?

I'll take another look at that. Empty isn't bad per-se, but we might as well avoid the output altogether.

could you paste an example of the output? Edit: never mind. Found it :)

-netdev -netclass Fix metric output to be ordered per-metric (and not per-device) as is done by node_exporter. In testing, netdev and netclass scrape time increased a mean 2msec as a result of the extra loops. In netdev, we skip some checks since the format of /proc/net/dev is stable. Signed-off-by: Paul Donald <newtwen+github@gmail.com>

-thermal management (temp and cooling devices) -uname amendment to oneline and help string -realtek-poe -odhcp6c -entropy help strings -time help string -hwmon -time clocksources -watchdog -metrics help strings Signed-off-by: Paul Donald <newtwen+github@gmail.com>

dhewg · 2025-12-19T16:39:46Z

(my) Reasoning being that we output ordered, rather than revisiting the same metrics for different devices. It makes stare and compare in the scrape output clearer, among other downstream benefits.

"Other downstream benefits" may be the reason to go this way, but that's a bit too hand-wavy argument for my taste ;)

Here's watchdog on an OpenWrt One, we got some (harmless) warnings and an empty pretimeout_governor:

prometheus-node-exporter-ucode watchdog
prometheus-node-exporter-ucode now serving requests with 16 collectors
Status: 200 OK
Content-Type: text/plain; version=0.0.4; charset=utf-8

# HELP node_watchdog_bootstatus Value of /sys/class/watchdog/<watchdog>/bootstatus
# TYPE node_watchdog_bootstatus gauge
node_watchdog_bootstatus{name="watchdog0"} 0
node_watchdog_bootstatus{name="watchdog1"} 0
# HELP node_watchdog_fw_version Value of /sys/class/watchdog/<watchdog>/fw_version
# TYPE node_watchdog_fw_version gauge
node_watchdog_fw_version{name="watchdog0"} 0
node_watchdog_fw_version{name="watchdog1"} 0
# HELP node_watchdog_nowayout Value of /sys/class/watchdog/<watchdog>/nowayout
# TYPE node_watchdog_nowayout gauge
node_watchdog_nowayout{name="watchdog0"} 0
node_watchdog_nowayout{name="watchdog1"} 0
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_timeleft_seconds)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_timeleft_seconds)
# HELP node_watchdog_timeout_seconds Value of /sys/class/watchdog/<watchdog>/timeout
# TYPE node_watchdog_timeout_seconds gauge
node_watchdog_timeout_seconds{name="watchdog0"} 30
node_watchdog_timeout_seconds{name="watchdog1"} 31
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_pretimeout_seconds)
# HELP node_watchdog_pretimeout_seconds Value of /sys/class/watchdog/<watchdog>/pretimeout
# TYPE node_watchdog_pretimeout_seconds gauge
node_watchdog_pretimeout_seconds{name="watchdog1"} 15
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_access_cs0)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_access_cs0)
# HELP node_watchdog_info Info of /sys/class/watchdog/<watchdog>
# TYPE node_watchdog_info gauge
node_watchdog_info{name="watchdog0",options="0x8180",identity="GPIO Watchdog",state="active",status="0x8000",pretimeout_governor=""} 1
node_watchdog_info{name="watchdog1",options="0x8380",identity="mtk-wdt",state="inactive",status="0x0",pretimeout_governor="panic"} 1
# HELP node_watchdog_available Info of /sys/class/watchdog/<watchdog>/pretimeout_available_governors
# TYPE node_watchdog_available gauge
node_watchdog_available{available="panic",device="watchdog1"} 1
# TYPE node_scrape_collector_duration_seconds gauge
node_scrape_collector_duration_seconds{collector="watchdog"} 0.005309986
# TYPE node_scrape_collector_success gauge
node_scrape_collector_success{collector="watchdog"} 1

dhewg · 2025-12-19T16:45:45Z

avm,fritzbox-7530 looks even more funny:

$ prometheus-node-exporter-ucode watchdog|grep \"\"
prometheus-node-exporter-ucode now serving requests with 20 collectors
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_bootstatus)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_fw_version)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_nowayout)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_timeleft_seconds)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_timeout_seconds)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_pretimeout_seconds)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_access_cs0)
node_watchdog_info{name="watchdog0",options="",identity="",state="",status="",pretimeout_governor=""} 1
`

systemcrash · 2025-12-19T18:13:12Z

(my) Reasoning being that we output ordered, rather than revisiting the same metrics for different devices. It makes stare and compare in the scrape output clearer, among other downstream benefits.

I got that, but why should scrape output readability trump code readability? Debugging comes to mind, but we can go just use our standard tools like prometheus-node-exporter-ucode hwmon|grep foo|sort|cut|awk|whatever

"Other downstream benefits" may be the reason to go this way, but that's a bit too hand-wavy argument for my taste ;)

Fair. Well, when you're comparing with Prometheus node_exporter, it helps quite a bit.

Here's watchdog on an OpenWrt One, we got some (harmless) warnings and an empty pretimeout_governor:

Ah, thanks!

DEBUG: skipping metric: unsupported value 'null' (node_watchdog_timeleft_seconds)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_timeleft_seconds)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_pretimeout_seconds)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_access_cs0)
DEBUG: skipping metric: unsupported value 'null' (node_watchdog_access_cs0)

These should all be resolved with the latest push.

# HELP node_watchdog_info Info of /sys/class/watchdog/<watchdog>
# TYPE node_watchdog_info gauge
node_watchdog_info{name="watchdog0",options="0x8180",identity="GPIO Watchdog",state="active",status="0x8000",pretimeout_governor=""} 1

As should this, without things like pretimeout_governor="". Should be fixed.

The same fixes should also work on the Fritz. We now skip it if the path is unavailable.

Copy variant from PR: openwrt/packages#28016 Signed-off-by: Vladimir Ermakov <vooon341@gmail.com>

systemcrash force-pushed the prom branch from c439341 to c283424 Compare December 5, 2025 14:48

GeorgeSapkin added enhancement Update package labels Dec 5, 2025

systemcrash force-pushed the prom branch from c283424 to a0d25b8 Compare December 5, 2025 17:42

systemcrash force-pushed the prom branch from a0d25b8 to bf3178d Compare December 5, 2025 17:59

systemcrash force-pushed the prom branch from bf3178d to 9e79f5d Compare December 9, 2025 13:28

GeorgeSapkin reviewed Dec 9, 2025

View reviewed changes

utils/prometheus-node-exporter-ucode/files/base/netclass.uc Outdated Show resolved Hide resolved

systemcrash force-pushed the prom branch from 9e79f5d to ef89a26 Compare December 11, 2025 14:10

GeorgeSapkin reviewed Dec 11, 2025

View reviewed changes

systemcrash added 2 commits December 19, 2025 16:10

systemcrash force-pushed the prom branch from ef89a26 to b5bab7e Compare December 19, 2025 15:32

vooon added a commit to vooon/my-openwrt-feed that referenced this pull request Jan 22, 2026

node-exporter: copy from upstream

cf6de2f

Copy variant from PR: openwrt/packages#28016 Signed-off-by: Vladimir Ermakov <vooon341@gmail.com>

Conversation

systemcrash commented Dec 5, 2025

📦 Package Details

🧪 Run Testing Details

✅ Formalities

Uh oh!

dhewg commented Dec 5, 2025 via email

Uh oh!

systemcrash commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GeorgeSapkin commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

systemcrash commented Dec 5, 2025

Uh oh!

systemcrash commented Dec 5, 2025

Uh oh!

systemcrash commented Dec 9, 2025

Uh oh!

dhewg commented Dec 9, 2025

Uh oh!

systemcrash commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

systemcrash commented Dec 9, 2025

Uh oh!

GeorgeSapkin Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

systemcrash Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GeorgeSapkin Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

systemcrash Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

systemcrash Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

GeorgeSapkin Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

systemcrash Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

systemcrash commented Dec 15, 2025

Uh oh!

dhewg commented Dec 19, 2025

Uh oh!

systemcrash commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhewg commented Dec 19, 2025

Uh oh!

dhewg commented Dec 19, 2025

Uh oh!

systemcrash commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

systemcrash commented Dec 5, 2025 •

edited

Loading

GeorgeSapkin commented Dec 5, 2025 •

edited

Loading

systemcrash commented Dec 9, 2025 •

edited

Loading

systemcrash commented Dec 19, 2025 •

edited

Loading