Skip to content

Commit 383dd00

Browse files
authored
feat: Add system watchdog configuration doc (#48465)
* feat: Add system watchdog configuration doc * chore: fix link and spelling mistake * chore: Fix formatting and provide clear descriptions. * chore: Optimize some typographic styles.
1 parent c4199a6 commit 383dd00

File tree

2 files changed

+85
-0
lines changed

2 files changed

+85
-0
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: SystemdWatchdog
3+
content_type: feature_gate
4+
_build:
5+
list: never
6+
render: false
7+
8+
stages:
9+
- stage: beta
10+
defaultValue: true
11+
fromVersion: "1.32"
12+
---
13+
Allow using systemd watchdog to monitor the health status of kubelet.
14+
See [Kubelet Systemd Watchdog](/docs/reference/node/systemd-watchdog/)
15+
for more details.
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
content_type: "reference"
3+
title: Kubelet Systemd Watchdog
4+
weight: 80
5+
---
6+
7+
{{< feature-state feature_gate_name="SystemdWatchdog" >}}
8+
9+
On Linux nodes, Kubernetes {{< skew currentVersion >}} supports integrating with
10+
[systemd](https://systemd.io/) to allow the operating system supervisor to recover
11+
a failed kubelet. This integration is not enabled by default.
12+
It can be used as an alternative to periodically requesting
13+
the kubelet's `/healthz` endpoint for health checks. If the kubelet
14+
does not respond to the watchdog within the timeout period, the watchdog
15+
will kill the kubelet.
16+
17+
The systemd watchdog works by requiring the service to periodically send
18+
a _keep-alive_ signal to the systemd process. If the signal is not received
19+
within a specified timeout period, the service is considered unresponsive
20+
and is terminated. The service can then be restarted according to the configuration.
21+
22+
## Configuration
23+
24+
Using the systemd watchdog requires configuring the `WatchdogSec` parameter
25+
in the `[Service]` section of the kubelet service unit file:
26+
```
27+
[Service]
28+
WatchdogSec=30s
29+
```
30+
31+
Setting `WatchdogSec=30s` indicates a service watchdog timeout of 30 seconds.
32+
Within the kubelet, the `sd_notify()` function is invoked, at intervals of `WatchdogSec` ÷ 2, to send
33+
`WATCHDOG=1` (a keep-alive message). If the watchdog is not fed
34+
within the timeout period, the kubelet will be killed. Setting `Restart`
35+
to "always", "on-failure", "on-watchdog", or "on-abnormal" will ensure that the service
36+
is automatically restarted.
37+
38+
Some details about the systemd configuration:
39+
40+
1. If you set the systemd value for `WatchdogSec` to 0, or omit setting it, the systemd watchdog is not
41+
enabled for this unit.
42+
2. The kubelet supports a minimum watchdog period of 1.0 seconds; this is to prevent the kubelet
43+
from being killed unexpectedly. You can set the value of `WatchdogSec` in a systemd unit definition
44+
to a period shorter than 1 second, but Kubernetes does not support any shorter interval.
45+
The timeout does not have to be a whole integer number of seconds.
46+
3. The Kubernetes project suggests setting `WatchdogSec` to approximately a 15s period.
47+
Periods longer than 10 minutes are supported but explicitly **not** recommended.
48+
49+
### Example Configuration
50+
```systemd
51+
[Unit]
52+
Description=kubelet: The Kubernetes Node Agent
53+
Documentation=https://kubernetes.io/docs/home/
54+
Wants=network-online.target
55+
After=network-online.target
56+
57+
[Service]
58+
ExecStart=/usr/bin/kubelet
59+
# Configures the watchdog timeout
60+
WatchdogSec=30s
61+
Restart=on-failure
62+
StartLimitInterval=0
63+
RestartSec=10
64+
65+
[Install]
66+
WantedBy=multi-user.target
67+
```
68+
## {{% heading "whatsnext" %}}
69+
For more details about systemd configuration, refer to the
70+
[systemd documentation](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#WatchdogSec=)

0 commit comments

Comments
 (0)