Skip to content

Commit 6edb5a3

Browse files
committed
[zh] Sync troubleshooting-kubeadm.md
1 parent ce359cb commit 6edb5a3

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed

content/zh-cn/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -767,3 +767,108 @@ Also see [How to run the metrics-server securely](https://github.com/kubernetes-
767767
以进一步了解如何在 kubeadm 集群中配置 kubelet 使用正确签名了的服务证书。
768768

769769
另请参阅 [How to run the metrics-server securely](https://github.com/kubernetes-sigs/metrics-server/blob/master/FAQ.md#how-to-run-metrics-server-securely)。
770+
771+
<!--
772+
## Upgrade fails due to etcd hash not changing
773+
774+
Only applicable to upgrading a control plane node with a kubeadm binary v1.28.3 or later,
775+
where the node is currently managed by kubeadm versions v1.28.0, v1.28.1 or v1.28.2.
776+
777+
Here is the error message you may encounter:
778+
-->
779+
## 因 etcd 哈希值无变化而升级失败 {#upgrade-fails-due-to-etcd-hash-not-changing}
780+
781+
仅适用于通过 kubeadm 二进制文件 v1.28.3 或更高版本升级控制平面节点的情况,
782+
其中此节点当前由 kubeadm v1.28.0、v1.28.1 或 v1.28.2 管理。
783+
784+
以下是你可能遇到的错误消息:
785+
786+
```
787+
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
788+
[upgrade/etcd] Waiting for previous etcd to become available
789+
I0907 10:10:09.109104 3704 etcd.go:588] [etcd] attempting to see if all cluster endpoints ([https://172.17.0.6:2379/ https://172.17.0.4:2379/ https://172.17.0.3:2379/]) are available 1/10
790+
[upgrade/etcd] Etcd was rolled back and is now available
791+
static Pod hash for component etcd on Node kinder-upgrade-control-plane-1 did not change after 5m0s: timed out waiting for the condition
792+
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
793+
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests
794+
cmd/kubeadm/app/phases/upgrade/staticpods.go:525
795+
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.upgradeComponent
796+
cmd/kubeadm/app/phases/upgrade/staticpods.go:254
797+
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.performEtcdStaticPodUpgrade
798+
cmd/kubeadm/app/phases/upgrade/staticpods.go:338
799+
...
800+
```
801+
802+
<!--
803+
The reason for this failure is that the affected versions generate an etcd manifest file with unwanted defaults in the PodSpec.
804+
This will result in a diff from the manifest comparison, and kubeadm will expect a change in the Pod hash, but the kubelet will never update the hash.
805+
806+
There are two way to workaround this issue if you see it in your cluster:
807+
- The etcd upgrade can be skipped between the affected versions and v1.28.3 (or later) by using:
808+
-->
809+
本次失败的原因是受影响的版本在 PodSpec 中生成的 etcd 清单文件带有不需要的默认值。
810+
这将导致与清单比较的差异,并且 kubeadm 预期 Pod 哈希值将发生变化,但 kubelet 永远不会更新哈希值。
811+
812+
如果你在集群中遇到此问题,有两种解决方法:
813+
814+
- 可以运行以下命令跳过 etcd 的版本升级,即受影响版本和 v1.28.3(或更高版本)之间的版本升级:
815+
816+
```shell
817+
kubeadm upgrade {apply|node} [version] --etcd-upgrade=false
818+
```
819+
820+
<!--
821+
This is not recommended in case a new etcd version was introduced by a later v1.28 patch version.
822+
823+
- Before upgrade, patch the manifest for the etcd static pod, to remove the problematic defaulted attributes:
824+
-->
825+
但不推荐这种方法,因为后续的 v1.28 补丁版本可能引入新的 etcd 版本。
826+
827+
- 在升级之前,对 etcd 静态 Pod 的清单进行修补,以删除有问题的默认属性:
828+
829+
```patch
830+
diff --git a/etc/kubernetes/manifests/etcd_defaults.yaml b/etc/kubernetes/manifests/etcd_origin.yaml
831+
index d807ccbe0aa..46b35f00e15 100644
832+
--- a/etc/kubernetes/manifests/etcd_defaults.yaml
833+
+++ b/etc/kubernetes/manifests/etcd_origin.yaml
834+
@@ -43,7 +43,6 @@ spec:
835+
scheme: HTTP
836+
initialDelaySeconds: 10
837+
periodSeconds: 10
838+
- successThreshold: 1
839+
timeoutSeconds: 15
840+
name: etcd
841+
resources:
842+
@@ -59,26 +58,18 @@ spec:
843+
scheme: HTTP
844+
initialDelaySeconds: 10
845+
periodSeconds: 10
846+
- successThreshold: 1
847+
timeoutSeconds: 15
848+
- terminationMessagePath: /dev/termination-log
849+
- terminationMessagePolicy: File
850+
volumeMounts:
851+
- mountPath: /var/lib/etcd
852+
name: etcd-data
853+
- mountPath: /etc/kubernetes/pki/etcd
854+
name: etcd-certs
855+
- dnsPolicy: ClusterFirst
856+
- enableServiceLinks: true
857+
hostNetwork: true
858+
priority: 2000001000
859+
priorityClassName: system-node-critical
860+
- restartPolicy: Always
861+
- schedulerName: default-scheduler
862+
securityContext:
863+
seccompProfile:
864+
type: RuntimeDefault
865+
- terminationGracePeriodSeconds: 30
866+
volumes:
867+
- hostPath:
868+
path: /etc/kubernetes/pki/etcd
869+
```
870+
871+
<!--
872+
More information can be found in the [tracking issue](https://github.com/kubernetes/kubeadm/issues/2927) for this bug.
873+
-->
874+
有关此错误的更多信息,请查阅[此问题的跟踪页面](https://github.com/kubernetes/kubeadm/issues/2927)

0 commit comments

Comments
 (0)