Skip to content

Commit 1d50bbf

Browse files
authored
Merge pull request #33863 from Sea-n/zh-setup
[zh] Sync troubleshooting-kubeadm.md
2 parents 44ec3f5 + 5e710d0 commit 1d50bbf

File tree

1 file changed

+99
-37
lines changed

1 file changed

+99
-37
lines changed

content/zh/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm.md

Lines changed: 99 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ If your problem is not listed below, please follow the following steps:
2121
- Go to [github.com/kubernetes/kubeadm](https://github.com/kubernetes/kubeadm/issues) and search for existing issues.
2222
- If no issue exists, please [open one](https://github.com/kubernetes/kubeadm/issues/new) and follow the issue template.
2323
24-
- If you are unsure about how kubeadm works, you can ask on [Slack](http://slack.k8s.io/) in #kubeadm, or open a question on [StackOverflow](https://stackoverflow.com/questions/tagged/kubernetes). Please include
24+
- If you are unsure about how kubeadm works, you can ask on [Slack](https://slack.k8s.io/) in `#kubeadm`,
25+
or open a question on [StackOverflow](https://stackoverflow.com/questions/tagged/kubernetes). Please include
2526
relevant tags like `#kubernetes` and `#kubeadm` so folks can help you.
2627
-->
2728
与任何程序一样,你可能会在安装或者运行 kubeadm 时遇到错误。
@@ -33,12 +34,73 @@ If your problem is not listed below, please follow the following steps:
3334
- 转到 [github.com/kubernetes/kubeadm](https://github.com/kubernetes/kubeadm/issues) 并搜索存在的问题。
3435
- 如果没有问题,请 [打开](https://github.com/kubernetes/kubeadm/issues/new) 并遵循问题模板。
3536

36-
- 如果你对 kubeadm 的工作方式有疑问,可以在 [Slack](https://slack.k8s.io/) 上的 #kubeadm 频道提问,
37+
- 如果你对 kubeadm 的工作方式有疑问,可以在 [Slack](https://slack.k8s.io/) 上的 `#kubeadm` 频道提问,
3738
或者在 [StackOverflow](https://stackoverflow.com/questions/tagged/kubernetes) 上提问。
3839
请加入相关标签,例如 `#kubernetes``#kubeadm`,这样其他人可以帮助你。
3940

4041
<!-- body -->
4142

43+
<!--
44+
## Not possible to join a v1.18 Node to a v1.17 cluster due to missing RBAC
45+
-->
46+
## 由于缺少 RBAC,无法将 v1.18 Node 加入 v1.17 集群
47+
48+
<!--
49+
In v1.18 kubeadm added prevention for joining a Node in the cluster if a Node with the same name already exists.
50+
This required adding RBAC for the bootstrap-token user to be able to GET a Node object.
51+
52+
However this causes an issue where `kubeadm join` from v1.18 cannot join a cluster created by kubeadm v1.17.
53+
-->
54+
自从 v1.18 后,如果集群中已存在同名 Node,kubeadm 将禁止 Node 加入集群。
55+
这需要为 bootstrap-token 用户添加 RBAC 才能 GET Node 对象。
56+
57+
但这会导致一个问题,v1.18 的 `kubeadm join` 无法加入由 kubeadm v1.17 创建的集群。
58+
59+
<!--
60+
To workaround the issue you have two options:
61+
62+
Execute `kubeadm init phase bootstrap-token` on a control-plane node using kubeadm v1.18.
63+
Note that this enables the rest of the bootstrap-token permissions as well.
64+
65+
or
66+
67+
Apply the following RBAC manually using `kubectl apply -f ...`:
68+
-->
69+
要解决此问题,你有两种选择:
70+
71+
使用 kubeadm v1.18 在控制平面节点上执行 `kubeadm init phase bootstrap-token`
72+
请注意,这也会启用 bootstrap-token 的其余权限。
73+
74+
或者,也可以使用 `kubectl apply -f ...` 手动应用以下 RBAC:
75+
76+
77+
```yaml
78+
apiVersion: rbac.authorization.k8s.io/v1
79+
kind: ClusterRole
80+
metadata:
81+
name: kubeadm:get-nodes
82+
rules:
83+
- apiGroups:
84+
- ""
85+
resources:
86+
- nodes
87+
verbs:
88+
- get
89+
---
90+
apiVersion: rbac.authorization.k8s.io/v1
91+
kind: ClusterRoleBinding
92+
metadata:
93+
name: kubeadm:get-nodes
94+
roleRef:
95+
apiGroup: rbac.authorization.k8s.io
96+
kind: ClusterRole
97+
name: kubeadm:get-nodes
98+
subjects:
99+
- apiGroup: rbac.authorization.k8s.io
100+
kind: Group
101+
name: system:bootstrappers:kubeadm:default-node-token
102+
```
103+
42104
<!--
43105
## `ebtables` or some similar executable not found during installation
44106

@@ -100,7 +162,7 @@ and investigating each container by running `docker logs`. For other container r
100162

101163
- 网络连接问题。在继续之前,请检查你的计算机是否具有全部联通的网络连接。
102164
- 容器运行时的 cgroup 驱动不同于 kubelet 使用的 cgroup 驱动。要了解如何正确配置 cgroup 驱动,
103-
请参阅[配置 cgroup 驱动](/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/)
165+
请参阅[配置 cgroup 驱动](/zh/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/)。
104166
- 控制平面上的 Docker 容器持续进入崩溃状态或(因其他原因)挂起。你可以运行 `docker ps` 命令来检查以及 `docker logs` 命令来检视每个容器的运行日志。
105167
对于其他容器运行时,请参阅[使用 crictl 对 Kubernetes 节点进行调试](/zh/docs/tasks/debug/debug-cluster/crictl/)。
106168

@@ -124,7 +186,7 @@ sudo kubeadm reset
124186

125187
A possible solution is to restart the container runtime and then re-run `kubeadm reset`.
126188
You can also use `crictl` to debug the state of the container runtime. See
127-
[Debugging Kubernetes nodes with crictl](/zh/docs/tasks/debug-application-cluster/crictl/).
189+
[Debugging Kubernetes nodes with crictl](/docs/tasks/debug/debug-cluster/crictl/).
128190
-->
129191
## 当删除托管容器时 kubeadm 阻塞
130192

@@ -169,15 +231,15 @@ Right after `kubeadm init` there should not be any pods in these states.
169231
直到你部署了网络插件为止。
170232

171233
- 如果在部署完网络插件之后,有 Pods 处于 `RunContainerError`、`CrashLoopBackOff`
172-
`Error` 状态之一,并且`coredns` (或者 `kube-dns`)仍处于 `Pending` 状态,
234+
或 `Error` 状态之一,并且 `coredns` (或者 `kube-dns`)仍处于 `Pending` 状态,
173235
那很可能是你安装的网络插件由于某种原因无法工作。你或许需要授予它更多的
174236
RBAC 特权或使用较新的版本。请在 Pod Network 提供商的问题跟踪器中提交问题,
175237
然后在此处分类问题。
176238

177239
- 如果你安装的 Docker 版本早于 1.12.1,请在使用 `systemd` 来启动 `dockerd` 和重启 `docker` 时,
178240
删除 `MountFlags=slave` 选项。
179241
你可以在 `/usr/lib/systemd/system/docker.service` 中看到 MountFlags。
180-
MountFlags 可能会干扰 Kubernetes 挂载的卷, 并使 Pods 处于 `CrashLoopBackOff` 状态。
242+
MountFlags 可能会干扰 Kubernetes 挂载的卷,并使 Pods 处于 `CrashLoopBackOff` 状态。
181243
当 Kubernetes 不能找到 `var/run/secrets/kubernetes.io/serviceaccount` 文件时会发生错误。
182244

183245
<!--
@@ -259,7 +321,7 @@ Unable to connect to the server: x509: certificate signed by unknown authority (
259321

260322
- Verify that the `$HOME/.kube/config` file contains a valid certificate, and
261323
regenerate a certificate if necessary. The certificates in a kubeconfig file
262-
are base64 encoded. The `base64 -d` command can be used to decode the certificate
324+
are base64 encoded. The `base64 --decode` command can be used to decode the certificate
263325
and `openssl x509 -text -noout` can be used for viewing the certificate information.
264326
- Unset the `KUBECONFIG` environment variable using:
265327

@@ -293,7 +355,7 @@ Unable to connect to the server: x509: certificate signed by unknown authority (
293355

294356
- 验证 `$HOME/.kube/config` 文件是否包含有效证书,并
295357
在必要时重新生成证书。在 kubeconfig 文件中的证书是 base64 编码的。
296-
`base64 -d` 命令可以用来解码证书,`openssl x509 -text -noout` 命令
358+
该 `base64 --decode` 命令可以用来解码证书,`openssl x509 -text -noout` 命令
297359
可以用于查看证书信息。
298360
- 使用如下方法取消设置 `KUBECONFIG` 环境变量的值:
299361

@@ -307,7 +369,7 @@ Unable to connect to the server: x509: certificate signed by unknown authority (
307369
export KUBECONFIG=/etc/kubernetes/admin.conf
308370
```
309371

310-
- 另一个方法是覆盖 `kubeconfig` 的现有用户 "管理员"
372+
- 另一个方法是覆盖 `kubeconfig` 的现有用户 "管理员":
311373

312374
```shell
313375
mv $HOME/.kube $HOME/.kube.bak
@@ -316,22 +378,6 @@ Unable to connect to the server: x509: certificate signed by unknown authority (
316378
sudo chown $(id -u):$(id -g) $HOME/.kube/config
317379
```
318380

319-
<!--
320-
## Default NIC When using flannel as the pod network in Vagrant
321-
322-
The following error might indicate that something was wrong in the pod network:
323-
324-
```sh
325-
Error from server (NotFound): the server could not find the requested resource
326-
```
327-
328-
- If you're using flannel as the pod network inside Vagrant, then you will have to specify the default interface name for flannel.
329-
330-
Vagrant typically assigns two interfaces to all VMs. The first, for which all hosts are assigned the IP address `10.0.2.15`, is for external traffic that gets NATed.
331-
332-
This may lead to problems with flannel, which defaults to the first interface on a host. This leads to all hosts thinking they have the same public IP address. To prevent this, pass the `-iface eth1` flag to flannel so that the second interface is chosen.
333-
-->
334-
335381
<!--
336382
## Kubelet client certificate rotation fails {#kubelet-client-cert}
337383

@@ -385,6 +431,22 @@ the `ca.key` you must sign the embedded certificates in the `kubelet.conf` exter
385431
6. 重新启动 kubelet。
386432
7. 确保节点状况变为 `Ready`。
387433

434+
<!--
435+
## Default NIC When using flannel as the pod network in Vagrant
436+
437+
The following error might indicate that something was wrong in the pod network:
438+
439+
```sh
440+
Error from server (NotFound): the server could not find the requested resource
441+
```
442+
443+
- If you're using flannel as the pod network inside Vagrant, then you will have to specify the default interface name for flannel.
444+
445+
Vagrant typically assigns two interfaces to all VMs. The first, for which all hosts are assigned the IP address `10.0.2.15`, is for external traffic that gets NATed.
446+
447+
This may lead to problems with flannel, which defaults to the first interface on a host. This leads to all hosts thinking they have the same public IP address. To prevent this, pass the `--iface eth1` flag to flannel so that the second interface is chosen.
448+
-->
449+
388450
## 在 Vagrant 中使用 flannel 作为 pod 网络时的默认 NIC
389451

390452
以下错误可能表明 Pod 网络中出现问题:
@@ -410,9 +472,9 @@ Error from server: Get https://10.19.0.41:10250/containerLogs/default/mysql-ddc6
410472
```
411473

412474
- This may be due to Kubernetes using an IP that can not communicate with other IPs on the seemingly same subnet, possibly by policy of the machine provider.
413-
- Digital Ocean assigns a public IP to `eth0` as well as a private one to be used internally as anchor for their floating IP feature, yet `kubelet` will pick the latter as the node's `InternalIP` instead of the public one.
475+
- DigitalOcean assigns a public IP to `eth0` as well as a private one to be used internally as anchor for their floating IP feature, yet `kubelet` will pick the latter as the node's `InternalIP` instead of the public one.
414476

415-
Use `ip addr show` to check for this scenario instead of `ifconfig` because `ifconfig` will not display the offending alias IP address. Alternatively an API endpoint specific to Digital Ocean allows to query for the anchor IP from the droplet:
477+
Use `ip addr show` to check for this scenario instead of `ifconfig` because `ifconfig` will not display the offending alias IP address. Alternatively an API endpoint specific to DigitalOcean allows to query for the anchor IP from the droplet:
416478

417479
```sh
418480
curl http://169.254.169.254/metadata/v1/interfaces/public/0/anchor_ipv4/address
@@ -442,18 +504,18 @@ Error from server: Get https://10.19.0.41:10250/containerLogs/default/mysql-ddc6
442504

443505
- 这或许是由于 Kubernetes 使用的 IP 无法与看似相同的子网上的其他 IP 进行通信的缘故,
444506
可能是由机器提供商的政策所导致的。
445-
- Digital Ocean 既分配一个共有 IP 给 `eth0`,也分配一个私有 IP 在内部用作其浮动 IP 功能的锚点,
507+
- DigitalOcean 既分配一个共有 IP 给 `eth0`,也分配一个私有 IP 在内部用作其浮动 IP 功能的锚点,
446508
然而 `kubelet` 将选择后者作为节点的 `InternalIP` 而不是公共 IP
447509

448510
使用 `ip addr show` 命令代替 `ifconfig` 命令去检查这种情况,因为 `ifconfig` 命令
449-
不会显示有问题的别名 IP 地址。或者指定的 Digital Ocean 的 API 端口允许从 droplet 中
511+
不会显示有问题的别名 IP 地址。或者指定的 DigitalOcean 的 API 端口允许从 droplet 中
450512
查询 anchor IP:
451513

452514
```sh
453515
curl http://169.254.169.254/metadata/v1/interfaces/public/0/anchor_ipv4/address
454516
```
455517

456-
解决方法是通知 `kubelet` 使用哪个 `--node-ip`。当使用 Digital Ocean 时,可以是公网IP(分配给 `eth0`的),
518+
解决方法是通知 `kubelet` 使用哪个 `--node-ip`。当使用 DigitalOcean 时,可以是公网IP(分配给 `eth0` 的),
457519
或者是私网IP(分配给 `eth1` 的)。私网 IP 是可选的。
458520
[kubadm `NodeRegistrationOptions` 结构](/zh/docs/reference/config-api/kubeadm-config.v1beta3/#kubeadm-k8s-io-v1beta3-NodeRegistrationOptions)
459521
的 `KubeletExtraArgs` 部分被用来处理这种情况。
@@ -535,7 +597,7 @@ yum downgrade docker-1.13.1-75.git8633870.el7.centos.x86_64 docker-client-1.13.1
535597

536598
- Install one of the more recent recommended versions, such as 18.06:
537599
```bash
538-
sudo yum-config-manager -add-repo https://download.docker.com/linux/centos/docker-ce.repo
600+
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
539601
yum install docker-ce-18.06.1.ce-3.el7.x86_64
540602
```
541603
-->
@@ -573,13 +635,13 @@ component like the kube-apiserver. However, this mechanism is limited due to the
573635
the values (`mapStringString`).
574636

575637
If you decide to pass an argument that supports multiple, comma-separated values such as
576-
`-apiserver-extra-args "enable-admission-plugins=LimitRanger,NamespaceExists"` this flag will fail with
638+
`--apiserver-extra-args "enable-admission-plugins=LimitRanger,NamespaceExists"` this flag will fail with
577639
`flag: malformed pair, expect string=string`. This happens because the list of arguments for
578-
`-apiserver-extra-args` expects `key=value` pairs and in this case `NamespacesExists` is considered
640+
`--apiserver-extra-args` expects `key=value` pairs and in this case `NamespacesExists` is considered
579641
as a key that is missing a value.
580642

581643
Alternatively, you can try separating the `key=value` pairs like so:
582-
`-apiserver-extra-args "enable-admission-plugins=LimitRanger,enable-admission-plugins=NamespaceExists"`
644+
`--apiserver-extra-args "enable-admission-plugins=LimitRanger,enable-admission-plugins=NamespaceExists"`
583645
but this will result in the key `enable-admission-plugins` only having the value of `NamespaceExists`.
584646

585647
A known workaround is to use the kubeadm [configuration file](/docs/reference/config-api/kubeadm-config.v1beta3/).
@@ -673,9 +735,9 @@ To workaround this issue you can configure the flex-volume directory using the k
673735
On the primary control-plane Node (created using `kubeadm init`) pass the following
674736
file using `--config`:
675737
-->
676-
为了解决这个问题,你可以使用 kubeadm 的[配置文件](/docs/reference/config-api/kubeadm-config.v1beta3/) 来配置 FlexVolume 的目录。
738+
为了解决这个问题,你可以使用 kubeadm 的[配置文件](/zh/docs/reference/config-api/kubeadm-config.v1beta3/) 来配置 FlexVolume 的目录。
677739

678-
在(使用 `kubeadm init` 创建的)主控制节点上,使用 `-config`
740+
在(使用 `kubeadm init` 创建的)主控制节点上,使用 `--config`
679741
参数传入如下文件:
680742

681743
```yaml
@@ -781,4 +843,4 @@ Also see [How to run the metrics-server securely](https://github.com/kubernetes-
781843
参见[为 kubelet 启用签名的服务证书](/zh/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#kubelet-serving-certs)
782844
以进一步了解如何在 kubeadm 集群中配置 kubelet 使用正确签名了的服务证书。
783845

784-
另请参阅[How to run the metrics-server securely](https://github.com/kubernetes-sigs/metrics-server/blob/master/FAQ.md#how-to-run-metrics-server-securely)。
846+
另请参阅 [How to run the metrics-server securely](https://github.com/kubernetes-sigs/metrics-server/blob/master/FAQ.md#how-to-run-metrics-server-securely)。

0 commit comments

Comments
 (0)