@@ -432,14 +432,16 @@ ConditionUnknown when a node becomes unreachable (i.e. the node controller stops
432
432
receiving heartbeats for some reason, e.g. due to the node being down), and then later evicting
433
433
all the pods from the node (using graceful termination) if the node continues
434
434
to be unreachable. (The default timeouts are 40s to start reporting
435
- ConditionUnknown and 5m after that to start evicting pods.) The node controller
436
- checks the state of each node every `-node-monitor-period` seconds.
435
+ ConditionUnknown and 5m after that to start evicting pods.)
436
+
437
+ The node controller checks the state of each node every `-node-monitor-period` seconds.
437
438
-->
438
439
第三个是监控节点的健康情况。节点控制器负责在节点不可达
439
440
(即,节点控制器因为某些原因没有收到心跳,例如节点宕机)时,
440
441
将节点状态的 ` NodeReady ` 状况更新为 "` Unknown ` "。
441
442
如果节点接下来持续处于不可达状态,节点控制器将逐出节点上的所有 Pod(使用体面终止)。
442
443
默认情况下 40 秒后开始报告 "` Unknown ` ",在那之后 5 分钟开始逐出 Pod。
444
+
443
445
节点控制器每隔 ` --node-monitor-period ` 秒检查每个节点的状态。
444
446
445
447
<!--
@@ -506,8 +508,9 @@ the same time. If the fraction of unhealthy nodes is at least
506
508
if the cluster is small (i.e. has less than or equal to
507
509
`-large-cluster-size-threshold` nodes - default 50) then evictions are
508
510
stopped, otherwise the eviction rate is reduced to
509
- `-secondary-node-eviction-rate` (default 0.01) per second. The reason these
510
- policies are implemented per availability zone is because one availability zone
511
+ `-secondary-node-eviction-rate` (default 0.01) per second.
512
+
513
+ The reason these policies are implemented per availability zone is because one availability zone
511
514
might become partitioned from the master while the others remain connected. If
512
515
your cluster does not span multiple cloud provider availability zones, then
513
516
there is only one availability zone (the whole cluster).
@@ -518,6 +521,7 @@ there is only one availability zone (the whole cluster).
518
521
驱逐速率将会降低:如果集群较小(意即小于等于 ` --large-cluster-size-threshold `
519
522
个节点 - 默认为 50),驱逐操作将会停止,否则驱逐速率将降为每秒
520
523
` --secondary-node-eviction-rate ` 个(默认为 0.01)。
524
+
521
525
在单个可用区域实施这些策略的原因是当一个可用区域可能从控制面脱离时其它可用区域
522
526
可能仍然保持连接。
523
527
如果你的集群没有跨越云服务商的多个可用区域,那(整个集群)就只有一个可用区域。
0 commit comments