|
| 1 | +--- |
| 2 | +content_type: "reference" |
| 3 | +title: kubelet systemd 看门狗 |
| 4 | +weight: 80 |
| 5 | +--- |
| 6 | +<!-- |
| 7 | +content_type: "reference" |
| 8 | +title: Kubelet Systemd Watchdog |
| 9 | +weight: 80 |
| 10 | +--> |
| 11 | + |
| 12 | +{{< feature-state feature_gate_name="SystemdWatchdog" >}} |
| 13 | + |
| 14 | +<!-- |
| 15 | +On Linux nodes, Kubernetes {{< skew currentVersion >}} supports integrating with |
| 16 | +[systemd](https://systemd.io/) to allow the operating system supervisor to recover |
| 17 | +a failed kubelet. This integration is not enabled by default. |
| 18 | +It can be used as an alternative to periodically requesting |
| 19 | +the kubelet's `/healthz` endpoint for health checks. If the kubelet |
| 20 | +does not respond to the watchdog within the timeout period, the watchdog |
| 21 | +will kill the kubelet. |
| 22 | +--> |
| 23 | +在 Linux 节点上,Kubernetes {{< skew currentVersion >}} 支持与 |
| 24 | +[systemd](https://systemd.io/) 集成,以允许操作系统监视程序恢复失败的 kubelet。 |
| 25 | +这种集成默认并未被启用。它可以作为一个替代方案,通过定期请求 kubelet 的 `/healthz` 端点进行健康检查。 |
| 26 | +如果 kubelet 在设定的超时时限内未对看门狗做出响应,看门狗将杀死 kubelet。 |
| 27 | + |
| 28 | +<!-- |
| 29 | +The systemd watchdog works by requiring the service to periodically send |
| 30 | +a _keep-alive_ signal to the systemd process. If the signal is not received |
| 31 | +within a specified timeout period, the service is considered unresponsive |
| 32 | +and is terminated. The service can then be restarted according to the configuration. |
| 33 | +--> |
| 34 | +systemd 看门狗的工作原理是要求服务定期向 systemd 进程发送一个**保持活跃**的信号。 |
| 35 | +如果 systemd 进程在指定的超时时限内未接收到某服务发出的信号,则对应的服务被视为无响应并被终止。 |
| 36 | +之后 systemd 进程可以基于配置重启该服务。 |
| 37 | + |
| 38 | +<!-- |
| 39 | +## Configuration |
| 40 | +
|
| 41 | +Using the systemd watchdog requires configuring the `WatchdogSec` parameter |
| 42 | +in the `[Service]` section of the kubelet service unit file: |
| 43 | +--> |
| 44 | +## 配置 {#configuration} |
| 45 | + |
| 46 | +使用 systemd 看门狗需要在 kubelet 服务单元文件的 `[Service]` 部分配置 `WatchdogSec` 参数: |
| 47 | + |
| 48 | +``` |
| 49 | +[Service] |
| 50 | +WatchdogSec=30s |
| 51 | +``` |
| 52 | + |
| 53 | +<!-- |
| 54 | +Setting `WatchdogSec=30s` indicates a service watchdog timeout of 30 seconds. |
| 55 | +Within the kubelet, the `sd_notify()` function is invoked, at intervals of `WatchdogSec` ÷ 2, to send |
| 56 | +`WATCHDOG=1` (a keep-alive message). If the watchdog is not fed |
| 57 | +within the timeout period, the kubelet will be killed. Setting `Restart` |
| 58 | +to "always", "on-failure", "on-watchdog", or "on-abnormal" will ensure that the service |
| 59 | +is automatically restarted. |
| 60 | +--> |
| 61 | +设置 `WatchdogSec=30s` 表示服务看门狗超时时限为 30 秒。 |
| 62 | +在 kubelet 内,`sd_notify()` 函数被调用,以 `WatchdogSec` ÷ 2 的时间间隔, |
| 63 | +发送 `WATCHDOG=1`(保持活跃的消息)。如果在超时时限内看门狗未被“投喂”此信号,kubelet 将被杀死。 |
| 64 | +将 `Restart` 设置为 "always"、"on-failure"、"on-watchdog" 或 "on-abnormal" |
| 65 | +将确保服务被自动重启。 |
| 66 | + |
| 67 | +<!-- |
| 68 | +Some details about the systemd configuration: |
| 69 | +
|
| 70 | +1. If you set the systemd value for `WatchdogSec` to 0, or omit setting it, the systemd watchdog is not |
| 71 | + enabled for this unit. |
| 72 | +2. The kubelet supports a minimum watchdog period of 1.0 seconds; this is to prevent the kubelet |
| 73 | + from being killed unexpectedly. You can set the value of `WatchdogSec` in a systemd unit definition |
| 74 | + to a period shorter than 1 second, but Kubernetes does not support any shorter interval. |
| 75 | + The timeout does not have to be a whole integer number of seconds. |
| 76 | +3. The Kubernetes project suggests setting `WatchdogSec` to approximately a 15s period. |
| 77 | + Periods longer than 10 minutes are supported but explicitly **not** recommended. |
| 78 | +--> |
| 79 | +systemd 配置相关的一些细节: |
| 80 | + |
| 81 | +1. 如果你将 systemd 的 `WatchdogSec` 值设置为 0,或省略不设置,则对应的单元上不启用 systemd 看门狗。 |
| 82 | +2. kubelet 支持设置的最小看门狗超时时限为 1.0 秒;这是为了防止 kubelet 被意外杀死。 |
| 83 | + 你可以在 systemd 单元定义中将 `WatchdogSec` 的值设置为短于 1 秒的超时时限, |
| 84 | + 但 Kubernetes 不支持任何更短的时间间隔。超时时限不必是整数的秒数。 |
| 85 | +3. Kubernetes 项目建议将 `WatchdogSec` 时限设置为大约 15 秒。 |
| 86 | + 系统支持超过 10 分钟的时限设置,但明确**不**推荐这样做。 |
| 87 | + |
| 88 | +<!-- |
| 89 | +### Example Configuration |
| 90 | +--> |
| 91 | +### 示例配置 {#example-configuration} |
| 92 | + |
| 93 | +<!-- |
| 94 | +# Configures the watchdog timeout |
| 95 | +--> |
| 96 | +```systemd |
| 97 | +[Unit] |
| 98 | +Description=kubelet: The Kubernetes Node Agent |
| 99 | +Documentation=https://kubernetes.io/docs/home/ |
| 100 | +Wants=network-online.target |
| 101 | +After=network-online.target |
| 102 | +
|
| 103 | +[Service] |
| 104 | +ExecStart=/usr/bin/kubelet |
| 105 | +# 配置看门狗的超时时限 |
| 106 | +WatchdogSec=30s |
| 107 | +Restart=on-failure |
| 108 | +StartLimitInterval=0 |
| 109 | +RestartSec=10 |
| 110 | +
|
| 111 | +[Install] |
| 112 | +WantedBy=multi-user.target |
| 113 | +``` |
| 114 | + |
| 115 | +## {{% heading "whatsnext" %}} |
| 116 | + |
| 117 | +<!-- |
| 118 | +For more details about systemd configuration, refer to the |
| 119 | +[systemd documentation](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#WatchdogSec=) |
| 120 | +--> |
| 121 | +有关 systemd 配置的细节,请参阅 |
| 122 | +[systemd 文档](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#WatchdogSec=)。 |
0 commit comments