Skip to content

Commit fd062f9

Browse files
committed
[zh-cn] Resync Windows debug page
1 parent 5036a1e commit fd062f9

File tree

1 file changed

+101
-94
lines changed
  • content/zh-cn/docs/tasks/debug/debug-cluster

1 file changed

+101
-94
lines changed
Lines changed: 101 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,22 @@
11
---
2-
title: Windows 调试小技巧
2+
title: Windows 调试技巧
33
content_type: concept
44
---
5+
56
<!--
7+
reviewers:
8+
- aravindhp
9+
- jayunit100
10+
- jsturtevant
11+
- marosset
612
title: Windows debugging tips
713
content_type: concept
814
-->
15+
916
<!-- overview -->
1017

1118
<!-- body -->
19+
1220
<!--
1321
## Node-level troubleshooting {#troubleshooting-node}
1422
@@ -17,37 +25,43 @@ content_type: concept
1725
Ensure that your pause image is compatible with your Windows OS version.
1826
See [Pause container](/docs/setup/production-environment/windows/intro-windows-in-kubernetes#pause-container)
1927
to see the latest / recommended pause image and/or get more information.
20-
21-
{{< note >}}
22-
If using containerd as your container runtime the pause image is specified in the
23-
`plugins.plugins.cri.sandbox_image` field of the of config.toml configration file.
24-
{{< /note >}}
2528
-->
2629
## 工作节点级别排障 {#troubleshooting-node}
2730

2831
1. 我的 Pod 都卡在 “Container Creating” 或者不断重启
2932

3033
确保你的 pause 镜像跟你的 Windows 版本兼容。
31-
查看 [Pause 容器](zh/docs/setup/production-environment/windows/intro-windows-in-kubernetes#pause-container)
34+
查看 [Pause 容器](/zh-cn/docs/setup/production-environment/windows/intro-windows-in-kubernetes#pause-container)
3235
以了解最新的或建议的 pause 镜像,或者了解更多信息。
3336

3437
{{< note >}}
35-
如果你使用了 containerd 作为你的容器运行时,pause 镜像在 config.toml 配置文件的
38+
<!--
39+
If using containerd as your container runtime the pause image is specified in the
40+
`plugins.plugins.cri.sandbox_image` field of the of config.toml configration file.
41+
-->
42+
如果你在使用 containerd 作为你的容器运行时,pause 镜像在 config.toml 配置文件的
3643
`plugins.plugins.cri.sandbox_image` 中指定。
3744
{{< /note >}}
45+
3846
<!--
3947
2. My pods show status as `ErrImgPull` or `ImagePullBackOff`
4048
41-
Ensure that your Pod is getting scheduled to a [compatable](https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility) Windows Node.
49+
Ensure that your Pod is getting scheduled to a
50+
[compatible](https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility)
51+
Windows Node.
4252
43-
More information on how to specify a compatable node for your Pod can be found in [this guide](/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host).
53+
More information on how to specify a compatible node for your Pod can be found in
54+
[this guide](/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host).
4455
-->
45-
2. 我的 pod 状态显示 'ErrImgPull' 或者 ImagePullBackOff
56+
2. 我的 Pod 状态显示 'ErrImgPull' 或者 'ImagePullBackOff'
4657

47-
保证你的 Pod 被调度到[兼容的](https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility) Windows 节点上。
58+
保证你的 Pod 被调度到[兼容的](https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility)
59+
Windows 节点上。
4860

4961
关于如何为你的 Pod 指定一个兼容节点,
50-
的更多信息可以查看这个指可以查看[这个指南](/zh-cn/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host)以了解更多的信息。
62+
可以查看这个指可以查看[这个指南](/zh-cn/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host)
63+
以了解更多的信息。
64+
5165
<!--
5266
## Network troubleshooting {#troubleshooting-network}
5367
@@ -61,19 +75,30 @@ content_type: concept
6175
1. 我的 Windows Pod 没有网络连接
6276

6377
如果你使用的是虚拟机,请确保所有 VM 网卡上都已启用 MAC spoofing。
78+
6479
<!--
65-
2. My Windows Pods cannot ping external resources
80+
1. My Windows Pods cannot ping external resources
6681
6782
Windows Pods do not have outbound rules programmed for the ICMP protocol. However,
6883
TCP/UDP is supported. When trying to demonstrate connectivity to resources
6984
outside of the cluster, substitute `ping <IP>` with corresponding
7085
`curl <IP>` commands.
86+
-->
87+
2. 我的 Windows Pod 不能 ping 通外界资源
88+
89+
Windows Pod 没有为 ICMP 协议编写出站规则,但 TCP/UDP 是支持的。当试图演示与集群外部资源的连接时,可以把 `ping <IP>` 替换为 `curl <IP>` 命令。
7190

91+
<!--
7292
If you are still facing problems, most likely your network configuration in
7393
[cni.conf](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf)
7494
deserves some extra attention. You can always edit this static file. The
7595
configuration update will apply to any new Kubernetes resources.
96+
-->
97+
如果你仍然遇到问题,很可能你需要额外关注
98+
[cni.conf](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf)
99+
的配置。你可以随时编辑这个静态文件。更新配置将应用于新的 Kubernetes 资源。
76100

101+
<!--
77102
One of the Kubernetes networking requirements
78103
(see [Kubernetes model](/docs/concepts/cluster-administration/networking/)) is
79104
for cluster communication to occur without
@@ -84,90 +109,76 @@ content_type: concept
84109
from the `ExceptionList`. Only then will the traffic originating from your Windows
85110
pods be SNAT'ed correctly to receive a response from the outside world. In this
86111
regard, your `ExceptionList` in `cni.conf` should look as follows:
87-
88-
```conf
89-
"ExceptionList": [
90-
"10.244.0.0/16", # Cluster subnet
91-
"10.96.0.0/12", # Service subnet
92-
"10.127.130.0/24" # Management (host) subnet
93-
]
94-
```
95-
-->
96-
2. 我的 Windows Pod 不能 ping 通外界资源
97-
98-
Windows Pod 没有为 ICMP 协议编写出站规则,但 TCP/UDP 是支持的。当试图演示与集群外部资源的连接时,可以把 `ping <IP>` 替换为 `curl <IP>` 命令。
99-
100-
如果你仍然遇到问题,很可能你需要额外关注
101-
[cni.conf](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf)
102-
的配置。你可以随时编辑这个静态文件。更新配置将应用于新的 Kubernetes 资源。
103-
112+
-->
104113
Kubernetes 的网络需求之一 (查看 [Kubernetes 模型](/zh-cn/docs/concepts/cluster-administration/networking/))
105114
是集群通信不需要内部的 NAT。
106-
为了遵守这一要求, 对于你不希望发生的出站 NAT 通信,这里有一个
115+
为了遵守这一要求,对于你不希望发生的出站 NAT 通信,这里有一个
107116
[ExceptionList](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf#L20)
108117
然而,这也意味着你需要从 `ExceptionList` 中去掉你试图查询的外部IP。
109118
只有这样,来自你的 Windows Pod 的流量才会被正确地 SNAT 转换,以接收来自外部环境的响应。
110119
就此而言,你的 `cni.conf` 中的 `ExceptionList` 应该如下所示:
111120

121+
<!--
112122
```conf
113123
"ExceptionList": [
114124
"10.244.0.0/16", # Cluster subnet
115125
"10.96.0.0/12", # Service subnet
116126
"10.127.130.0/24" # Management (host) subnet
117127
]
118128
```
129+
-->
130+
131+
```conf
132+
"ExceptionList": [
133+
"10.244.0.0/16", # 集群子网
134+
"10.96.0.0/12", # 服务子网
135+
"10.127.130.0/24" # 管理(主机)子网
136+
]
137+
```
119138
<!--
120-
3. My Windows node cannot access `NodePort` type Services
139+
1. My Windows node cannot access `NodePort` type Services
121140
122141
Local NodePort access from the node itself fails. This is a known
123142
limitation. NodePort access works from other nodes or external clients.
143+
-->
144+
3. 我的 Windows 节点无法访问 `NodePort` 类型 Service
124145

125-
4. vNICs and HNS endpoints of containers are being deleted
146+
从节点本身访问本地 NodePort 失败,是一个已知的限制。
147+
你可以从其他节点或外部客户端正常访问 NodePort。
148+
149+
<!--
150+
1. vNICs and HNS endpoints of containers are being deleted
126151
127152
This issue can be caused when the `hostname-override` parameter is not passed to
128153
[kube-proxy](/docs/reference/command-line-tools-reference/kube-proxy/). To resolve
129154
it, users need to pass the hostname to kube-proxy as follows:
130-
131-
```powershell
132-
C:\k\kube-proxy.exe --hostname-override=$(hostname)
133-
```
134155
-->
135-
3. 我的 Windows 节点无法访问 `NodePort` 类型服务
136-
137-
从节点本身访问本地 NodePort 失败,是一个已知的限制。你可以从其他节点或外部客户端正常访问 NodePort。
156+
4. 容器的 vNIC 和 HNS 端点正在被删除
138157

139-
4. 容器的 vnic 和 HNS endpoints 正在被删除
140-
141-
`hostname-override` 参数没有传递给 [kube-proxy](/zh-cn/docs/reference/command-line-tools-reference/kube-proxy/)
158+
`hostname-override` 参数没有传递给
159+
[kube-proxy](/zh-cn/docs/reference/command-line-tools-reference/kube-proxy/)
142160
时可能引发这一问题。想要解决这个问题,用户需要将主机名传递给 kube-proxy,如下所示:
143161

144162
```powershell
145163
C:\k\kube-proxy.exe --hostname-override=$(hostname)
146164
```
165+
147166
<!--
148-
5. My Windows node cannot access my services using the service IP
167+
1. My Windows node cannot access my services using the service IP
149168
150169
This is a known limitation of the networking stack on Windows. However, Windows Pods can access the Service IP.
170+
-->
171+
5. 我的 Windows 节点无法通过服务 IP 访问我的服务
151172

152-
6. No network adapter is found when starting the kubelet
173+
这是 Windows 上网络栈的一个已知限制。但是 Windows Pod 可以访问 Service IP。
174+
175+
<!--
176+
1. No network adapter is found when starting the kubelet
153177
154178
The Windows networking stack needs a virtual adapter for Kubernetes networking to work.
155179
If the following commands return no results (in an admin shell),
156180
virtual network creation — a necessary prerequisite for the kubelet to work — has failed:
157-
158-
```powershell
159-
Get-HnsNetwork | ? Name -ieq "cbr0"
160-
Get-NetAdapter | ? Name -Like "vEthernet (Ethernet*"
161-
```
162-
163-
Often it is worthwhile to modify the [InterfaceName](https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7) parameter of the start.ps1 script,
164-
in cases where the host's network adapter isn't "Ethernet".
165-
Otherwise, consult the output of the `start-kubelet.ps1` script to see if there are errors during virtual network creation.
166181
-->
167-
5. 我的 Windows 节点无法通过服务 IP 访问我的服务
168-
169-
这是 Windows 上网络栈的一个已知限制。但是 Windows Pod 可以访问 Service IP。
170-
171182
6. 启动 kubelet 时找不到网络适配器
172183

173184
Windows 网络栈需要一个虚拟适配器才能使 Kubernetes 网络工作。
@@ -179,39 +190,42 @@ content_type: concept
179190
Get-NetAdapter | ? Name -Like "vEthernet (Ethernet*"
180191
```
181192

193+
<!--
194+
Often it is worthwhile to modify the [InterfaceName](https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7) parameter of the start.ps1 script,
195+
in cases where the host's network adapter isn't "Ethernet".
196+
Otherwise, consult the output of the `start-kubelet.ps1` script to see if there are errors during virtual network creation.
197+
-->
182198
如果主机的网络适配器不是 "Ethernet",通常有必要修改 `start.ps1` 脚本的
183-
[InterfaceName](https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7) 参数。
184-
否则,如果虚拟网络创建过程出错,请检查 `start-kubelet.ps1` 脚本的输出。
199+
[InterfaceName](https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7)
200+
参数。否则,如果虚拟网络创建过程出错,请检查 `start-kubelet.ps1` 脚本的输出。
201+
185202
<!--
186-
7. DNS resolution is not properly working
203+
1. DNS resolution is not properly working
187204
188205
Check the DNS limitations for Windows in this [section](#dns-limitations).
206+
-->
207+
7. DNS 解析工作异常
208+
209+
查阅[这一节](#dns-limitations)中讲述的 Windows 系统上的 DNS 限制。
189210

190-
8. `kubectl port-forward` fails with "unable to do port forwarding: wincat not found"
211+
<!--
212+
1. `kubectl port-forward` fails with "unable to do port forwarding: wincat not found"
191213
192214
This was implemented in Kubernetes 1.15 by including `wincat.exe` in the pause infrastructure container `mcr.microsoft.com/oss/kubernetes/pause:3.6`.
193215
Be sure to use a supported version of Kubernetes.
194216
If you would like to build your own pause infrastructure container be sure to include [wincat](https://github.com/kubernetes/kubernetes/tree/master/build/pause/windows/wincat).
195217
-->
196-
7. DNS 解析工作异常
197-
198-
[本节](#dns-limitations)中了解 Windows 系统上的 DNS 限制。
199-
200218
8. `kubectl port-forward` 失败,错误为 "unable to do port forwarding: wincat not found"
201219

202220
在 Kubernetes 1.15 中,pause 基础架构容器 `mcr.microsoft.com/oss/kubernetes/pause:3.6`
203221
中包含 `wincat.exe` 来实现端口转发。
204222
请确保使用 Kubernetes 的受支持版本。如果你想构建自己的 pause 基础架构容器,
205223
请确保其中包含 [wincat](https://github.com/kubernetes/kubernetes/tree/master/build/pause/windows/wincat)
224+
206225
<!--
207-
9. My Kubernetes installation is failing because my Windows Server node is behind a proxy
226+
1. My Kubernetes installation is failing because my Windows Server node is behind a proxy
208227
209228
If you are behind a proxy, the following PowerShell environment variables must be defined:
210-
211-
```PowerShell
212-
[Environment]::SetEnvironmentVariable("HTTP_PROXY", "http://proxy.example.com:80/", [EnvironmentVariableTarget]::Machine)
213-
[Environment]::SetEnvironmentVariable("HTTPS_PROXY", "http://proxy.example.com:443/", [EnvironmentVariableTarget]::Machine)
214-
```
215229
-->
216230
9. 我的 Kubernetes 安装失败,因为我的 Windows 服务器节点使用了代理服务器
217231

@@ -221,6 +235,7 @@ content_type: concept
221235
[Environment]::SetEnvironmentVariable("HTTP_PROXY", "http://proxy.example.com:80/", [EnvironmentVariableTarget]::Machine)
222236
[Environment]::SetEnvironmentVariable("HTTPS_PROXY", "http://proxy.example.com:443/", [EnvironmentVariableTarget]::Machine)
223237
```
238+
224239
<!--
225240
### Flannel troubleshooting
226241
@@ -229,11 +244,6 @@ content_type: concept
229244
Whenever a previously deleted node is being re-joined to the cluster, flannelD
230245
tries to assign a new pod subnet to the node. Users should remove the old pod
231246
subnet configuration files in the following paths:
232-
233-
```powershell
234-
Remove-Item C:\k\SourceVip.json
235-
Remove-Item C:\k\SourceVipRequest.json
236-
```
237247
-->
238248
## Flannel 故障排查 {#troubleshooting-network}
239249

@@ -246,43 +256,39 @@ content_type: concept
246256
Remove-Item C:\k\SourceVip.json
247257
Remove-Item C:\k\SourceVipRequest.json
248258
```
259+
249260
<!--
250-
2. Flanneld is stuck in "Waiting for the Network to be created"
261+
1. Flanneld is stuck in "Waiting for the Network to be created"
251262
252263
There are numerous reports of this [issue](https://github.com/coreos/flannel/issues/1066);
253264
most likely it is a timing issue for when the management IP of the flannel network is set.
254265
A workaround is to relaunch `start.ps1` or relaunch it manually as follows:
255-
256-
```powershell
257-
[Environment]::SetEnvironmentVariable("NODE_NAME", "<Windows_Worker_Hostname>")
258-
C:\flannel\flanneld.exe --kubeconfig-file=c:\k\config --iface=<Windows_Worker_Node_IP> --ip-masq=1 --kube-subnet-mgr=1
259-
```
260266
-->
261267
2. Flanneld 卡在 "Waiting for the Network to be created"
262268

263-
关于这个[问题](https://github.com/coreos/flannel/issues/1066)有很多报告
264-
很可能是 flannel 网络管理 IP 的设置时机问题。
269+
关于这个[问题](https://github.com/coreos/flannel/issues/1066)有很多报告;
270+
很可能是 Flannel 网络管理 IP 的设置时机问题。
265271
一个变通方法是重新启动 `start.ps1` 或按如下方式手动重启:
266272

273+
<!--
274+
```powershell
275+
[Environment]::SetEnvironmentVariable("NODE_NAME", "<Windows_Worker_Hostname>")
276+
C:\flannel\flanneld.exe --kubeconfig-file=c:\k\config --iface=<Windows_Worker_Node_IP> --ip-masq=1 --kube-subnet-mgr=1
277+
```
278+
-->
267279
```powershell
268280
[Environment]::SetEnvironmentVariable("NODE_NAME", "<Windows 工作节点主机名>")
269281
C:\flannel\flanneld.exe --kubeconfig-file=c:\k\config --iface=<Windows 工作节点 IP> --ip-masq=1 --kube-subnet-mgr=1
270282
```
283+
271284
<!--
272-
3. My Windows Pods cannot launch because of missing `/run/flannel/subnet.env`
285+
1. My Windows Pods cannot launch because of missing `/run/flannel/subnet.env`
273286
274287
This indicates that Flannel didn't launch correctly. You can either try
275288
to restart `flanneld.exe` or you can copy the files over manually from
276289
`/run/flannel/subnet.env` on the Kubernetes master to `C:\run\flannel\subnet.env`
277290
on the Windows worker node and modify the `FLANNEL_SUBNET` row to a different
278291
number. For example, if node subnet 10.244.4.1/24 is desired:
279-
280-
```env
281-
FLANNEL_NETWORK=10.244.0.0/16
282-
FLANNEL_SUBNET=10.244.4.1/24
283-
FLANNEL_MTU=1500
284-
FLANNEL_IPMASQ=true
285-
```
286292
-->
287293
3. 我的 Windows Pod 无法启动,因为缺少 `/run/flannel/subnet.env`
288294

@@ -312,3 +318,4 @@ If these steps don't resolve your problem, you can get help running Windows cont
312318
* StackOverflow [Windows Server Container](https://stackoverflow.com/questions/tagged/windows-server-container) topic
313319
* Kubernetes 官方论坛 [discuss.kubernetes.io](https://discuss.kubernetes.io/)
314320
* Kubernetes Slack [#SIG-Windows Channel](https://kubernetes.slack.com/messages/sig-windows)
321+

0 commit comments

Comments
 (0)