Skip to content

Commit 2098a6d

Browse files
authored
Merge pull request #49157 from windsonsea/dogwat
[zh] Add systemd-watchdog.md and its feature gate
2 parents 6c9f574 + ba2944f commit 2098a6d

File tree

2 files changed

+142
-0
lines changed

2 files changed

+142
-0
lines changed
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
title: SystemdWatchdog
3+
content_type: feature_gate
4+
_build:
5+
list: never
6+
render: false
7+
8+
stages:
9+
- stage: beta
10+
defaultValue: true
11+
fromVersion: "1.32"
12+
---
13+
14+
<!--
15+
Allow using systemd watchdog to monitor the health status of kubelet.
16+
See [Kubelet Systemd Watchdog](/docs/reference/node/systemd-watchdog/)
17+
for more details.
18+
-->
19+
允许使用 systemd 看门狗监控 kubelet 的健康状态。更多细节参阅
20+
[kubelet systemd 看门狗](/zh-cn/docs/reference/node/systemd-watchdog/)
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
---
2+
content_type: "reference"
3+
title: kubelet systemd 看门狗
4+
weight: 80
5+
---
6+
<!--
7+
content_type: "reference"
8+
title: Kubelet Systemd Watchdog
9+
weight: 80
10+
-->
11+
12+
{{< feature-state feature_gate_name="SystemdWatchdog" >}}
13+
14+
<!--
15+
On Linux nodes, Kubernetes {{< skew currentVersion >}} supports integrating with
16+
[systemd](https://systemd.io/) to allow the operating system supervisor to recover
17+
a failed kubelet. This integration is not enabled by default.
18+
It can be used as an alternative to periodically requesting
19+
the kubelet's `/healthz` endpoint for health checks. If the kubelet
20+
does not respond to the watchdog within the timeout period, the watchdog
21+
will kill the kubelet.
22+
-->
23+
在 Linux 节点上,Kubernetes {{< skew currentVersion >}} 支持与
24+
[systemd](https://systemd.io/) 集成,以允许操作系统监视程序恢复失败的 kubelet。
25+
这种集成默认并未被启用。它可以作为一个替代方案,通过定期请求 kubelet 的 `/healthz` 端点进行健康检查。
26+
如果 kubelet 在设定的超时时限内未对看门狗做出响应,看门狗将杀死 kubelet。
27+
28+
<!--
29+
The systemd watchdog works by requiring the service to periodically send
30+
a _keep-alive_ signal to the systemd process. If the signal is not received
31+
within a specified timeout period, the service is considered unresponsive
32+
and is terminated. The service can then be restarted according to the configuration.
33+
-->
34+
systemd 看门狗的工作原理是要求服务定期向 systemd 进程发送一个**保持活跃**的信号。
35+
如果 systemd 进程在指定的超时时限内未接收到某服务发出的信号,则对应的服务被视为无响应并被终止。
36+
之后 systemd 进程可以基于配置重启该服务。
37+
38+
<!--
39+
## Configuration
40+
41+
Using the systemd watchdog requires configuring the `WatchdogSec` parameter
42+
in the `[Service]` section of the kubelet service unit file:
43+
-->
44+
## 配置 {#configuration}
45+
46+
使用 systemd 看门狗需要在 kubelet 服务单元文件的 `[Service]` 部分配置 `WatchdogSec` 参数:
47+
48+
```
49+
[Service]
50+
WatchdogSec=30s
51+
```
52+
53+
<!--
54+
Setting `WatchdogSec=30s` indicates a service watchdog timeout of 30 seconds.
55+
Within the kubelet, the `sd_notify()` function is invoked, at intervals of `WatchdogSec` ÷ 2, to send
56+
`WATCHDOG=1` (a keep-alive message). If the watchdog is not fed
57+
within the timeout period, the kubelet will be killed. Setting `Restart`
58+
to "always", "on-failure", "on-watchdog", or "on-abnormal" will ensure that the service
59+
is automatically restarted.
60+
-->
61+
设置 `WatchdogSec=30s` 表示服务看门狗超时时限为 30 秒。
62+
在 kubelet 内,`sd_notify()` 函数被调用,以 `WatchdogSec` ÷ 2 的时间间隔,
63+
发送 `WATCHDOG=1`(保持活跃的消息)。如果在超时时限内看门狗未被“投喂”此信号,kubelet 将被杀死。
64+
`Restart` 设置为 "always"、"on-failure"、"on-watchdog" 或 "on-abnormal"
65+
将确保服务被自动重启。
66+
67+
<!--
68+
Some details about the systemd configuration:
69+
70+
1. If you set the systemd value for `WatchdogSec` to 0, or omit setting it, the systemd watchdog is not
71+
enabled for this unit.
72+
2. The kubelet supports a minimum watchdog period of 1.0 seconds; this is to prevent the kubelet
73+
from being killed unexpectedly. You can set the value of `WatchdogSec` in a systemd unit definition
74+
to a period shorter than 1 second, but Kubernetes does not support any shorter interval.
75+
The timeout does not have to be a whole integer number of seconds.
76+
3. The Kubernetes project suggests setting `WatchdogSec` to approximately a 15s period.
77+
Periods longer than 10 minutes are supported but explicitly **not** recommended.
78+
-->
79+
systemd 配置相关的一些细节:
80+
81+
1. 如果你将 systemd 的 `WatchdogSec` 值设置为 0,或省略不设置,则对应的单元上不启用 systemd 看门狗。
82+
2. kubelet 支持设置的最小看门狗超时时限为 1.0 秒;这是为了防止 kubelet 被意外杀死。
83+
你可以在 systemd 单元定义中将 `WatchdogSec` 的值设置为短于 1 秒的超时时限,
84+
但 Kubernetes 不支持任何更短的时间间隔。超时时限不必是整数的秒数。
85+
3. Kubernetes 项目建议将 `WatchdogSec` 时限设置为大约 15 秒。
86+
系统支持超过 10 分钟的时限设置,但明确****推荐这样做。
87+
88+
<!--
89+
### Example Configuration
90+
-->
91+
### 示例配置 {#example-configuration}
92+
93+
<!--
94+
# Configures the watchdog timeout
95+
-->
96+
```systemd
97+
[Unit]
98+
Description=kubelet: The Kubernetes Node Agent
99+
Documentation=https://kubernetes.io/docs/home/
100+
Wants=network-online.target
101+
After=network-online.target
102+
103+
[Service]
104+
ExecStart=/usr/bin/kubelet
105+
# 配置看门狗的超时时限
106+
WatchdogSec=30s
107+
Restart=on-failure
108+
StartLimitInterval=0
109+
RestartSec=10
110+
111+
[Install]
112+
WantedBy=multi-user.target
113+
```
114+
115+
## {{% heading "whatsnext" %}}
116+
117+
<!--
118+
For more details about systemd configuration, refer to the
119+
[systemd documentation](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#WatchdogSec=)
120+
-->
121+
有关 systemd 配置的细节,请参阅
122+
[systemd 文档](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#WatchdogSec=)

0 commit comments

Comments
 (0)