Skip to content

Naemon reload fails silently randomly #443

@bryce-souers

Description

@bryce-souers

I have an issue where sometimes a reload will seem to work, and then it will crash with no messages telling why.

I ran omd reload over 30 times, and then I hit this issue.

naemon.log

[1705509045] SERVICE ALERT: <REDACTED>;check_ap_ipv6;CRITICAL;SOFT;3;PING CRITICAL [fda9:ba49:dcb2:32a5:213:5009:1:9fa2] - Packet loss = 100%
[1705509050] Caught 'Hangup', restarting...
[1705509052] Retention data successfully saved.
[1705509052] Event broker module 'NERD' deinitialized successfully.
[1705509052] npcdmod: If you don't like me, I will go out! Bye.
[1705509052] Event broker module '/omd/sites/ssnomd/lib/npcdmod4.o' deinitialized successfully.
[1705509052] livestatus: deinitializing
[1705509052] livestatus: Logfile cache: flushing complete cache.
[1705509052] Event broker module '/omd/sites/ssnomd/lib/naemon/livestatus.o' deinitialized successfully.
[1705509053] Naemon 1.4.0 starting... (PID=2823702)
[1705509053] Local time is Wed Jan 17 16:30:53 UTC 2024
[1705509053] LOG VERSION: 2.0
[1705509053] qh: Socket '/omd/sites/ssnomd/var/naemon/naemon.qh' successfully initialized
[1705509053] nerd: Channel hostchecks registered successfully
[1705509053] nerd: Channel servicechecks registered successfully
[1705509053] nerd: Fully initialized and ready to rock!

debug.log

[1705509052.736467] [016.0] [pid=2823702] Service 'check_2days_uptime' on host 'REDACTED' handle_service_check_event()...
[1705509052.742485] [064.0] [pid=2823702] Attempting to unload module 'NERD': flags=1, reason=2
[1705509052.742527] [064.0] [pid=2823702] Module 'NERD' unloaded successfully.
[1705509052.742578] [064.0] [pid=2823702] Attempting to unload module '/omd/sites/ssnomd/lib/npcdmod4.o': flags=1, reason=2
[1705509052.742693] [064.0] [pid=2823702] Module '/omd/sites/ssnomd/lib/npcdmod4.o' unloaded successfully.
[1705509052.742744] [064.0] [pid=2823702] Attempting to unload module '/omd/sites/ssnomd/lib/naemon/livestatus.o': flags=1, reason=2
[1705509052.744596] [064.0] [pid=2823702] Module '/omd/sites/ssnomd/lib/naemon/livestatus.o' unloaded successfully.
[1705509053.382125] [064.1] [pid=2823702] Making callbacks (type 2)...
[1705509053.382187] [064.1] [pid=2823702] Making callbacks (type 2)...
[1705509053.382220] [064.1] [pid=2823702] Making callbacks (type 2)...
[1705509053.382251] [064.1] [pid=2823702] Making callbacks (type 2)...
[1705509053.382270] [4096.0] [pid=2823702] wproc: Successfully registered manager as @wproc with query handler
[1705509053.589750] [4096.0] [pid=2823702] wproc: Registry request: name=Core Worker 2897369;pid=2897369
[1705509053.590014] [4096.0] [pid=2823702] wproc: Registry request: name=Core Worker 2897370;pid=2897370
[1705509053.590135] [4096.0] [pid=2823702] wproc: Registry request: name=Core Worker 2897371;pid=2897371
[1705509053.629639] [4096.0] [pid=2823702] wproc: Registry request: name=Core Worker 2897372;pid=2897372
[1705509053.666278] [4096.0] [pid=2823702] wproc: Registry request: name=Core Worker 2897374;pid=2897374
[1705509053.672590] [4096.0] [pid=2823702] wproc: Registry request: name=Core Worker 2897373;pid=2897373

omd status

[ssnomd@lasssnpr01nag02 ~]$ omd status
rrdcached:      running
npcd:           running
naemon:         stopped
apache:         running
xinetd:         running
crontab:        running
-----------------------
Overall state:  partially running

ps aux

[ssnomd@lasssnpr01nag02 ~]$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0  11928  2848 ?        Ss   Jan16   0:00 bash /root/start.sh
root         451  0.0  0.0  22964  2936 ?        Ss   Jan16   0:00 /usr/sbin/crond
root        1383  0.0  0.0  11928  2720 ?        S    Jan16   0:00 /bin/sh /usr/sbin/apachectl -D FOREGROUND
root        1387  0.0  0.0 267260 14672 ?        S    Jan16   0:05 /usr/sbin/httpd -D FOREGROUND
apache      1400  0.0  0.0 268696  8424 ?        S    Jan16   0:00 /usr/sbin/httpd -D FOREGROUND
apache      1401  0.0  0.0 270072  8760 ?        S    Jan16   0:01 /usr/sbin/httpd -D FOREGROUND
apache      1402  0.0  0.0 2704096 12792 ?       Sl   Jan16   0:36 /usr/sbin/httpd -D FOREGROUND
apache      1403  0.0  0.0 2572960 10416 ?       Sl   Jan16   0:35 /usr/sbin/httpd -D FOREGROUND
apache      1418  0.0  0.0 2572960 12984 ?       Sl   Jan16   0:35 /usr/sbin/httpd -D FOREGROUND
apache    390476  0.0  0.0 2507424 10444 ?       Sl   Jan16   0:28 /usr/sbin/httpd -D FOREGROUND
root     2711500  0.0  0.0  19248  3616 pts/1    Ss   14:44   0:00 bash
root     2711611  0.0  0.0 115204  7284 pts/1    S    14:44   0:00 sudo su - ssnomd
root     2711612  0.0  0.0 109184  6164 pts/1    S    14:44   0:00 su - ssnomd
ssnomd   2711613  0.0  0.0  22508  4400 pts/1    S    14:44   0:00 -bash
root     2739377  0.0  0.0  19248  3688 pts/2    Ss   14:55   0:00 bash
root     2739563  0.0  0.0 115204  7272 pts/2    S    14:55   0:00 sudo su - ssnomd
root     2739564  0.0  0.0 109184  6332 pts/2    S    14:55   0:00 su - ssnomd
ssnomd   2739566  0.0  0.0  22496  4428 pts/2    S    14:55   0:00 -bash
root     2766725  0.0  0.0  19248  3600 pts/0    Ss   15:06   0:00 bash
root     2766786  0.0  0.0 109184  6172 pts/0    S    15:06   0:00 su - ssnomd
ssnomd   2766787  0.0  0.0  22388  4436 pts/0    S+   15:06   0:00 -bash
root     2808475  0.0  0.0  19248  3716 pts/3    Ss   15:22   0:00 bash
root     2817334  0.0  0.0  19248  3684 pts/4    Ss   15:27   0:00 bash
root     2817353  0.0  0.0 115204  7492 pts/4    S    15:27   0:00 sudo su - ssnomd
root     2817354  0.0  0.0 109184  6184 pts/4    S    15:27   0:00 su - ssnomd
ssnomd   2817355  0.0  0.0  22492  4396 pts/4    S+   15:27   0:00 -bash
root     2819589  0.0  0.0 115204  7400 pts/3    S    15:40   0:00 sudo su - ssnomd
root     2819590  0.0  0.0 109184  6112 pts/3    S    15:40   0:00 su - ssnomd
ssnomd   2819591  0.0  0.0  22388  4416 pts/3    S+   15:40   0:00 -bash
ssnomd   2823608  0.0  0.0 317892  8072 ?        Ssl  16:01   0:01 /omd/sites/ssnomd/bin/rrdcached -t 4 -w 3600 -z 1800 -f 7200 -s ssnomd -m 660 -l unix:/omd/sites/ssnomd/tmp/run/rrdcached.sock -p /omd/sites/ssnomd/tmp/rrdcached.pid -j /omd/sites/ssnomd/var/rrdcached -o /omd/sites/ssnomd/var/log/rrd
ssnomd   2823640  0.1  0.0 156184   380 ?        S    16:01   0:03 /omd/sites/ssnomd/bin/npcd -d -f /omd/sites/ssnomd/etc/pnp4nagios/npcd.cfg
ssnomd   2823760  0.0  0.0 118732  5936 ?        Ss   16:01   0:00 /usr/sbin/httpd -f /omd/sites/ssnomd/etc/apache/apache.conf
ssnomd   2823762  0.0  0.1 105612 30812 ?        S    16:01   0:00 perl -x /omd/sites/ssnomd/share/thruk/script/thruk_auth
ssnomd   2823763  0.0  0.0 118884  2596 ?        S    16:01   0:00 /usr/sbin/httpd -f /omd/sites/ssnomd/etc/apache/apache.conf
ssnomd   2823764  0.0  0.0 119400  5728 ?        S    16:01   0:00 /usr/sbin/httpd -f /omd/sites/ssnomd/etc/apache/apache.conf
ssnomd   2823832  0.0  0.4 162832 76892 ?        S    16:01   0:02 perl /omd/sites/ssnomd/share/thruk/script/thruk_fastcgi.pl
ssnomd   2823839  0.0  0.0 119640  5752 ?        S    16:01   0:00 /usr/sbin/httpd -f /omd/sites/ssnomd/etc/apache/apache.conf
ssnomd   2823920  0.0  0.0  55064  3400 ?        Ss   16:01   0:00 /omd/sites/ssnomd/var/tmp/xinetd -pidfile /omd/sites/ssnomd/tmp/run/xinetd.pid -filelog /omd/sites/ssnomd/var/log/xinetd.log -f /omd/sites/ssnomd/etc/xinetd.conf
ssnomd   2852767  0.0  0.0 118868  4860 ?        S    16:13   0:00 /usr/sbin/httpd -f /omd/sites/ssnomd/etc/apache/apache.conf
ssnomd   2895736  0.0  0.0  26000  1540 pts/1    S+   16:30   0:00 /usr/bin/coreutils --coreutils-prog-shebang=tail /usr/bin/tail -n 500 -f var/log/naemon.log
ssnomd   2898643  200  0.0  47644  3848 pts/2    R+   16:54   0:00 ps aux

Any ideas for what could be going wrong here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions