Skip to content

Remote rsyslog forwarding failure stops local logging #1523

@gsanchietti

Description

@gsanchietti

A firewall node appears to be “dead” after a reboot: rsyslog is running but stops writing further local logs when a configured remote syslog destination becomes unreachable. When this happens the web UI shows errors and becomes unresponsive, some RPC commands block, DNS resolution (when DNS query logging is enabled) fails, and the firewall looks unusable even though core networking is still up. The team reproduced the issue in a VM and found the behavior is triggered by sustained log flooding to an unreachable remote syslog (TCP) target.

Steps to reproduce:

  • Configure rsyslog to forward logs to a remote TCP syslog target that is unreachable (e.g. controller offline).
  • Ensure the rsyslog configuration forwards to remote before local file logging (current problematic ordering).
  • Generate a large continuous stream of log messages on the firewall, for example: a=0; while true; do logger -t xx c$a; ((a++)); done
  • Wait until rsyslog has processed a large number of messages (symptoms observed around ~100k messages on a VM; a much smaller threshold can reproduce in test).
  • Observe that local /var/log/messages stops being written, UI and RPC calls hang, and DNS resolution may fail if query logging is enabled.

Expected behavior:
If a remote syslog destination is unreachable, rsyslog should continue writing local logs and should not block other system services. The web UI, RPC calls (e.g. ns.dashboard service-status), and DNS resolution should remain responsive. Local logging should be robust in presence of transient or prolonged remote forwarding failures.

Actual behavior:

rsyslog stops writing locally to /var/log/messages after the remote forwarding becomes blocked by a heavy flood of outbound logs.
UI shows multiple errors and becomes unresponsive; some rpcd/ns.dashboard calls block (example: echo '{"service":"netifyd"}' | /usr/libexec/rpcd/ns.dashboard call service-status hangs).
If DNS query logging is enabled, dnsmasq may stop resolving.
The firewall appears “blocked” to users, though network interfaces remain up and some networking still functions.
The root cause observed: remote forward ruleset appears to block logging when the remote TCP syslog peer is unreachable or being hammered; changing rsyslog configuration order (local logging before remote forwarding) prevented the issue in tests.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

ToDo 🕐

Relationships

None yet

Development

No branches or pull requests

Issue actions