After deploying or rolling out deployment node-problem-detector, i encounter a kmsg channel closed error on some nodes.
As a result, kernel monitor based metrics are not collected on those nodes.
The affected nodes are not under any significant load, and there are essentially no kernel log messages being generated.
Environment
- Ubuntu Jammy (kernel 5.15.0-x)
- Ubuntu Noble (kernel 6.8.0-x)
log
I1030 06:41:19.605611 1 log_watchers.go:40] Use log watcher of plugin "kmsg"
<REDACTED>
I1030 06:41:19.605732 1 log_watchers.go:40] Use log watcher of plugin "filelog"
I1030 06:41:19.606512 1 k8s_exporter.go:54] Waiting for kube-apiserver to be ready (timeout 5m0s)...
I1030 06:41:19.614361 1 node_problem_detector.go:63] K8s exporter started.
I1030 06:41:19.614493 1 node_problem_detector.go:67] Prometheus exporter started.
I1030 06:41:19.614504 1 log_monitor.go:111] Start log monitor /custom-config/additional-filelog.json
I1030 06:41:19.614541 1 log_watcher.go:80] Start watching filelog
I1030 06:41:19.614549 1 log_monitor.go:111] Start log monitor /config/kernel-monitor.json
I1030 06:41:19.614613 1 log_monitor.go:236] Initialize condition generated: []
I1030 06:41:19.615573 1 log_monitor.go:111] Start log monitor /config/docker-monitor.json
I1030 06:41:19.615599 1 log_monitor.go:236] Initialize condition generated: [{Type:KernelDeadlock Status:False Transition:2025-10-30 06:41:19.615589877 +0000 UTC m=+0.055785295 Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:ReadonlyFilesystem Status:False Transition:2025-10-30 06:41:19.61558997 +0000 UTC m=+0.055785381 Reason:FilesystemIsNotReadOnly Message:Filesystem is not read-only}]
E1030 06:41:19.615656 1 log_watcher_linux.go:105] Kmsg channel closed
E1030 06:41:19.615696 1 log_monitor.go:137] Log channel closed: /config/kernel-monitor.json
I1030 06:41:19.619274 1 log_watcher.go:80] Start watching journald
I1030 06:41:19.619292 1 log_monitor.go:111] Start log monitor /config/systemd-monitor.json
I1030 06:41:19.619325 1 log_monitor.go:236] Initialize condition generated: [{Type:CorruptDockerOverlay2 Status:False Transition:2025-10-30 06:41:19.619318108 +0000 UTC m=+0.059513518 Reason:NoCorruptDockerOverlay2 Message:docker overlay2 is functioning properly}]
I1030 06:41:19.621986 1 log_watcher.go:80] Start watching journald
I1030 06:41:19.622011 1 log_monitor.go:111] Start log monitor /custom-config/additional.json
I1030 06:41:19.622120 1 log_monitor.go:236] Initialize condition generated: []
I1030 06:41:19.623024 1 problem_detector.go:76] Problem detector started
I1030 06:41:19.623053 1 log_monitor.go:236] Initialize condition generated: []
E1030 06:41:19.623115 1 log_watcher_linux.go:105] Kmsg channel closed
E1030 06:41:19.623138 1 log_monitor.go:137] Log channel closed: /custom-config/additional.json
reproduce
On the affected nodes, cat /dev/kmsg exits immediately
root@hostname:~# cat /dev/kmsg
6,2047,25658836,-;microcode: CPU 171: patch_level=0x0a0011d5
cat: /dev/kmsg: Broken pipe
What I've tried
- Running
dmesg -C to consume kmsg has no effect.
journalctl -kf is working normally.
- running
echo -n "kerustest" > /dev/kmsg can sometimes resolve the broken pipe problem.
After deploying or rolling out deployment node-problem-detector, i encounter a
kmsg channel closederror on some nodes.As a result, kernel monitor based metrics are not collected on those nodes.
The affected nodes are not under any significant load, and there are essentially no kernel log messages being generated.
Environment
log
reproduce
On the affected nodes,
cat /dev/kmsgexits immediatelyWhat I've tried
dmesg -Cto consume kmsg has no effect.journalctl -kfis working normally.echo -n "kerustest" > /dev/kmsgcan sometimes resolve the broken pipe problem.