1
1
---
2
- title : Windows 调试小技巧
2
+ title : Windows 调试技巧
3
3
content_type : concept
4
4
---
5
+
5
6
<!--
7
+ reviewers:
8
+ - aravindhp
9
+ - jayunit100
10
+ - jsturtevant
11
+ - marosset
6
12
title: Windows debugging tips
7
13
content_type: concept
8
14
-->
15
+
9
16
<!-- overview -->
10
17
11
18
<!-- body -->
19
+
12
20
<!--
13
21
## Node-level troubleshooting {#troubleshooting-node}
14
22
@@ -17,37 +25,43 @@ content_type: concept
17
25
Ensure that your pause image is compatible with your Windows OS version.
18
26
See [Pause container](/docs/setup/production-environment/windows/intro-windows-in-kubernetes#pause-container)
19
27
to see the latest / recommended pause image and/or get more information.
20
-
21
- {{< note >}}
22
- If using containerd as your container runtime the pause image is specified in the
23
- `plugins.plugins.cri.sandbox_image` field of the of config.toml configration file.
24
- {{< /note >}}
25
28
-->
26
29
## 工作节点级别排障 {#troubleshooting-node}
27
30
28
31
1 . 我的 Pod 都卡在 “Container Creating” 或者不断重启
29
32
30
33
确保你的 pause 镜像跟你的 Windows 版本兼容。
31
- 查看 [ Pause 容器] ( zh /docs/setup/production-environment/windows/intro-windows-in-kubernetes#pause-container)
34
+ 查看 [ Pause 容器] ( /zh-cn /docs/setup/production-environment/windows/intro-windows-in-kubernetes#pause-container)
32
35
以了解最新的或建议的 pause 镜像,或者了解更多信息。
33
36
34
37
{{< note >}}
35
- 如果你使用了 containerd 作为你的容器运行时,pause 镜像在 config.toml 配置文件的
38
+ <!--
39
+ If using containerd as your container runtime the pause image is specified in the
40
+ `plugins.plugins.cri.sandbox_image` field of the of config.toml configration file.
41
+ -->
42
+ 如果你在使用 containerd 作为你的容器运行时,pause 镜像在 config.toml 配置文件的
36
43
` plugins.plugins.cri.sandbox_image ` 中指定。
37
44
{{< /note >}}
45
+
38
46
<!--
39
47
2. My pods show status as `ErrImgPull` or `ImagePullBackOff`
40
48
41
- Ensure that your Pod is getting scheduled to a [compatable](https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility) Windows Node.
49
+ Ensure that your Pod is getting scheduled to a
50
+ [compatible](https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility)
51
+ Windows Node.
42
52
43
- More information on how to specify a compatable node for your Pod can be found in [this guide](/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host).
53
+ More information on how to specify a compatible node for your Pod can be found in
54
+ [this guide](/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host).
44
55
-->
45
- 2 . 我的 pod 状态显示 'ErrImgPull' 或者 ‘ ImagePullBackOff’
56
+ 2 . 我的 Pod 状态显示 'ErrImgPull' 或者 ' ImagePullBackOff'
46
57
47
- 保证你的 Pod 被调度到[ 兼容的] ( https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility ) Windows 节点上。
58
+ 保证你的 Pod 被调度到[ 兼容的] ( https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility )
59
+ Windows 节点上。
48
60
49
61
关于如何为你的 Pod 指定一个兼容节点,
50
- 的更多信息可以查看这个指可以查看[ 这个指南] ( /zh-cn/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host ) 以了解更多的信息。
62
+ 可以查看这个指可以查看[ 这个指南] ( /zh-cn/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host )
63
+ 以了解更多的信息。
64
+
51
65
<!--
52
66
## Network troubleshooting {#troubleshooting-network}
53
67
@@ -61,19 +75,30 @@ content_type: concept
61
75
1 . 我的 Windows Pod 没有网络连接
62
76
63
77
如果你使用的是虚拟机,请确保所有 VM 网卡上都已启用 MAC spoofing。
78
+
64
79
<!--
65
- 2 . My Windows Pods cannot ping external resources
80
+ 1 . My Windows Pods cannot ping external resources
66
81
67
82
Windows Pods do not have outbound rules programmed for the ICMP protocol. However,
68
83
TCP/UDP is supported. When trying to demonstrate connectivity to resources
69
84
outside of the cluster, substitute `ping <IP>` with corresponding
70
85
`curl <IP>` commands.
86
+ -->
87
+ 2 . 我的 Windows Pod 不能 ping 通外界资源
88
+
89
+ Windows Pod 没有为 ICMP 协议编写出站规则,但 TCP/UDP 是支持的。当试图演示与集群外部资源的连接时,可以把 ` ping <IP> ` 替换为 ` curl <IP> ` 命令。
71
90
91
+ <!--
72
92
If you are still facing problems, most likely your network configuration in
73
93
[cni.conf](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf)
74
94
deserves some extra attention. You can always edit this static file. The
75
95
configuration update will apply to any new Kubernetes resources.
96
+ -->
97
+ 如果你仍然遇到问题,很可能你需要额外关注
98
+ [ cni.conf] ( https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf )
99
+ 的配置。你可以随时编辑这个静态文件。更新配置将应用于新的 Kubernetes 资源。
76
100
101
+ <!--
77
102
One of the Kubernetes networking requirements
78
103
(see [Kubernetes model](/docs/concepts/cluster-administration/networking/)) is
79
104
for cluster communication to occur without
@@ -84,90 +109,76 @@ content_type: concept
84
109
from the `ExceptionList`. Only then will the traffic originating from your Windows
85
110
pods be SNAT'ed correctly to receive a response from the outside world. In this
86
111
regard, your `ExceptionList` in `cni.conf` should look as follows:
87
-
88
- ```conf
89
- "ExceptionList": [
90
- "10.244.0.0/16", # Cluster subnet
91
- "10.96.0.0/12", # Service subnet
92
- "10.127.130.0/24" # Management (host) subnet
93
- ]
94
- ```
95
- -->
96
- 2 . 我的 Windows Pod 不能 ping 通外界资源
97
-
98
- Windows Pod 没有为 ICMP 协议编写出站规则,但 TCP/UDP 是支持的。当试图演示与集群外部资源的连接时,可以把 ` ping <IP> ` 替换为 ` curl <IP> ` 命令。
99
-
100
- 如果你仍然遇到问题,很可能你需要额外关注
101
- [ cni.conf] ( https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf )
102
- 的配置。你可以随时编辑这个静态文件。更新配置将应用于新的 Kubernetes 资源。
103
-
112
+ -->
104
113
Kubernetes 的网络需求之一 (查看 [ Kubernetes 模型] ( /zh-cn/docs/concepts/cluster-administration/networking/ ) )
105
114
是集群通信不需要内部的 NAT。
106
- 为了遵守这一要求, 对于你不希望发生的出站 NAT 通信,这里有一个
115
+ 为了遵守这一要求,对于你不希望发生的出站 NAT 通信,这里有一个
107
116
[ ExceptionList] ( https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf#L20 ) 。
108
117
然而,这也意味着你需要从 ` ExceptionList ` 中去掉你试图查询的外部IP。
109
118
只有这样,来自你的 Windows Pod 的流量才会被正确地 SNAT 转换,以接收来自外部环境的响应。
110
119
就此而言,你的 ` cni.conf ` 中的 ` ExceptionList ` 应该如下所示:
111
120
121
+ <!--
112
122
```conf
113
123
"ExceptionList": [
114
124
"10.244.0.0/16", # Cluster subnet
115
125
"10.96.0.0/12", # Service subnet
116
126
"10.127.130.0/24" # Management (host) subnet
117
127
]
118
128
```
129
+ -->
130
+
131
+ ``` conf
132
+ "ExceptionList": [
133
+ "10.244.0.0/16", # 集群子网
134
+ "10.96.0.0/12", # 服务子网
135
+ "10.127.130.0/24" # 管理(主机)子网
136
+ ]
137
+ ```
119
138
<!--
120
- 3 . My Windows node cannot access `NodePort` type Services
139
+ 1 . My Windows node cannot access `NodePort` type Services
121
140
122
141
Local NodePort access from the node itself fails. This is a known
123
142
limitation. NodePort access works from other nodes or external clients.
143
+ -->
144
+ 3 . 我的 Windows 节点无法访问 ` NodePort ` 类型 Service
124
145
125
- 4. vNICs and HNS endpoints of containers are being deleted
146
+ 从节点本身访问本地 NodePort 失败,是一个已知的限制。
147
+ 你可以从其他节点或外部客户端正常访问 NodePort。
148
+
149
+ <!--
150
+ 1. vNICs and HNS endpoints of containers are being deleted
126
151
127
152
This issue can be caused when the `hostname-override` parameter is not passed to
128
153
[kube-proxy](/docs/reference/command-line-tools-reference/kube-proxy/). To resolve
129
154
it, users need to pass the hostname to kube-proxy as follows:
130
-
131
- ```powershell
132
- C:\k\kube-proxy.exe --hostname-override=$(hostname)
133
- ```
134
155
-->
135
- 3 . 我的 Windows 节点无法访问 ` NodePort ` 类型服务
136
-
137
- 从节点本身访问本地 NodePort 失败,是一个已知的限制。你可以从其他节点或外部客户端正常访问 NodePort。
156
+ 4 . 容器的 vNIC 和 HNS 端点正在被删除
138
157
139
- 4 . 容器的 vnic 和 HNS endpoints 正在被删除
140
-
141
- 当 ` hostname-override ` 参数没有传递给 [ kube-proxy] ( /zh-cn/docs/reference/command-line-tools-reference/kube-proxy/ )
158
+ 当 ` hostname-override ` 参数没有传递给
159
+ [ kube-proxy] ( /zh-cn/docs/reference/command-line-tools-reference/kube-proxy/ )
142
160
时可能引发这一问题。想要解决这个问题,用户需要将主机名传递给 kube-proxy,如下所示:
143
161
144
162
``` powershell
145
163
C:\k\kube-proxy.exe --hostname-override=$(hostname)
146
164
```
165
+
147
166
<!--
148
- 5 . My Windows node cannot access my services using the service IP
167
+ 1 . My Windows node cannot access my services using the service IP
149
168
150
169
This is a known limitation of the networking stack on Windows. However, Windows Pods can access the Service IP.
170
+ -->
171
+ 5 . 我的 Windows 节点无法通过服务 IP 访问我的服务
151
172
152
- 6. No network adapter is found when starting the kubelet
173
+ 这是 Windows 上网络栈的一个已知限制。但是 Windows Pod 可以访问 Service IP。
174
+
175
+ <!--
176
+ 1. No network adapter is found when starting the kubelet
153
177
154
178
The Windows networking stack needs a virtual adapter for Kubernetes networking to work.
155
179
If the following commands return no results (in an admin shell),
156
180
virtual network creation — a necessary prerequisite for the kubelet to work — has failed:
157
-
158
- ```powershell
159
- Get-HnsNetwork | ? Name -ieq "cbr0"
160
- Get-NetAdapter | ? Name -Like "vEthernet (Ethernet*"
161
- ```
162
-
163
- Often it is worthwhile to modify the [InterfaceName](https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7) parameter of the start.ps1 script,
164
- in cases where the host's network adapter isn't "Ethernet".
165
- Otherwise, consult the output of the `start-kubelet.ps1` script to see if there are errors during virtual network creation.
166
181
-->
167
- 5 . 我的 Windows 节点无法通过服务 IP 访问我的服务
168
-
169
- 这是 Windows 上网络栈的一个已知限制。但是 Windows Pod 可以访问 Service IP。
170
-
171
182
6 . 启动 kubelet 时找不到网络适配器
172
183
173
184
Windows 网络栈需要一个虚拟适配器才能使 Kubernetes 网络工作。
@@ -179,39 +190,42 @@ content_type: concept
179
190
Get-NetAdapter | ? Name -Like "vEthernet (Ethernet*"
180
191
```
181
192
193
+ <!--
194
+ Often it is worthwhile to modify the [InterfaceName](https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7) parameter of the start.ps1 script,
195
+ in cases where the host's network adapter isn't "Ethernet".
196
+ Otherwise, consult the output of the `start-kubelet.ps1` script to see if there are errors during virtual network creation.
197
+ -->
182
198
如果主机的网络适配器不是 "Ethernet",通常有必要修改 ` start.ps1 ` 脚本的
183
- [ InterfaceName] ( https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7 ) 参数。
184
- 否则,如果虚拟网络创建过程出错,请检查 ` start-kubelet.ps1 ` 脚本的输出。
199
+ [ InterfaceName] ( https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7 )
200
+ 参数。否则,如果虚拟网络创建过程出错,请检查 ` start-kubelet.ps1 ` 脚本的输出。
201
+
185
202
<!--
186
- 7 . DNS resolution is not properly working
203
+ 1 . DNS resolution is not properly working
187
204
188
205
Check the DNS limitations for Windows in this [section](#dns-limitations).
206
+ -->
207
+ 7 . DNS 解析工作异常
208
+
209
+ 查阅[ 这一节] ( #dns-limitations ) 中讲述的 Windows 系统上的 DNS 限制。
189
210
190
- 8. `kubectl port-forward` fails with "unable to do port forwarding: wincat not found"
211
+ <!--
212
+ 1. `kubectl port-forward` fails with "unable to do port forwarding: wincat not found"
191
213
192
214
This was implemented in Kubernetes 1.15 by including `wincat.exe` in the pause infrastructure container `mcr.microsoft.com/oss/kubernetes/pause:3.6`.
193
215
Be sure to use a supported version of Kubernetes.
194
216
If you would like to build your own pause infrastructure container be sure to include [wincat](https://github.com/kubernetes/kubernetes/tree/master/build/pause/windows/wincat).
195
217
-->
196
- 7 . DNS 解析工作异常
197
-
198
- 在[ 本节] ( #dns-limitations ) 中了解 Windows 系统上的 DNS 限制。
199
-
200
218
8 . ` kubectl port-forward ` 失败,错误为 "unable to do port forwarding: wincat not found"
201
219
202
220
在 Kubernetes 1.15 中,pause 基础架构容器 ` mcr.microsoft.com/oss/kubernetes/pause:3.6 `
203
221
中包含 ` wincat.exe ` 来实现端口转发。
204
222
请确保使用 Kubernetes 的受支持版本。如果你想构建自己的 pause 基础架构容器,
205
223
请确保其中包含 [ wincat] ( https://github.com/kubernetes/kubernetes/tree/master/build/pause/windows/wincat ) 。
224
+
206
225
<!--
207
- 9 . My Kubernetes installation is failing because my Windows Server node is behind a proxy
226
+ 1 . My Kubernetes installation is failing because my Windows Server node is behind a proxy
208
227
209
228
If you are behind a proxy, the following PowerShell environment variables must be defined:
210
-
211
- ```PowerShell
212
- [Environment]::SetEnvironmentVariable("HTTP_PROXY", "http://proxy.example.com:80/", [EnvironmentVariableTarget]::Machine)
213
- [Environment]::SetEnvironmentVariable("HTTPS_PROXY", "http://proxy.example.com:443/", [EnvironmentVariableTarget]::Machine)
214
- ```
215
229
-->
216
230
9 . 我的 Kubernetes 安装失败,因为我的 Windows 服务器节点使用了代理服务器
217
231
@@ -221,6 +235,7 @@ content_type: concept
221
235
[Environment]::SetEnvironmentVariable("HTTP_PROXY", "http://proxy.example.com:80/", [EnvironmentVariableTarget]::Machine)
222
236
[Environment]::SetEnvironmentVariable("HTTPS_PROXY", "http://proxy.example.com:443/", [EnvironmentVariableTarget]::Machine)
223
237
```
238
+
224
239
<!--
225
240
### Flannel troubleshooting
226
241
@@ -229,11 +244,6 @@ content_type: concept
229
244
Whenever a previously deleted node is being re-joined to the cluster, flannelD
230
245
tries to assign a new pod subnet to the node. Users should remove the old pod
231
246
subnet configuration files in the following paths:
232
-
233
- ```powershell
234
- Remove-Item C:\k\SourceVip.json
235
- Remove-Item C:\k\SourceVipRequest.json
236
- ```
237
247
-->
238
248
## Flannel 故障排查 {#troubleshooting-network}
239
249
@@ -246,43 +256,39 @@ content_type: concept
246
256
Remove-Item C:\k\SourceVip.json
247
257
Remove-Item C:\k\SourceVipRequest.json
248
258
```
259
+
249
260
<!--
250
- 2 . Flanneld is stuck in "Waiting for the Network to be created"
261
+ 1 . Flanneld is stuck in "Waiting for the Network to be created"
251
262
252
263
There are numerous reports of this [issue](https://github.com/coreos/flannel/issues/1066);
253
264
most likely it is a timing issue for when the management IP of the flannel network is set.
254
265
A workaround is to relaunch `start.ps1` or relaunch it manually as follows:
255
-
256
- ```powershell
257
- [Environment]::SetEnvironmentVariable("NODE_NAME", "<Windows_Worker_Hostname>")
258
- C:\flannel\flanneld.exe --kubeconfig-file=c:\k\config --iface=<Windows_Worker_Node_IP> --ip-masq=1 --kube-subnet-mgr=1
259
- ```
260
266
-->
261
267
2 . Flanneld 卡在 "Waiting for the Network to be created"
262
268
263
- 关于这个[ 问题] ( https://github.com/coreos/flannel/issues/1066 ) 有很多报告 ;
264
- 很可能是 flannel 网络管理 IP 的设置时机问题。
269
+ 关于这个[ 问题] ( https://github.com/coreos/flannel/issues/1066 ) 有很多报告;
270
+ 很可能是 Flannel 网络管理 IP 的设置时机问题。
265
271
一个变通方法是重新启动 ` start.ps1 ` 或按如下方式手动重启:
266
272
273
+ <!--
274
+ ```powershell
275
+ [Environment]::SetEnvironmentVariable("NODE_NAME", "<Windows_Worker_Hostname>")
276
+ C:\flannel\flanneld.exe --kubeconfig-file=c:\k\config --iface=<Windows_Worker_Node_IP> --ip-masq=1 --kube-subnet-mgr=1
277
+ ```
278
+ -->
267
279
``` powershell
268
280
[Environment]::SetEnvironmentVariable("NODE_NAME", "<Windows 工作节点主机名>")
269
281
C:\flannel\flanneld.exe --kubeconfig-file=c:\k\config --iface=<Windows 工作节点 IP> --ip-masq=1 --kube-subnet-mgr=1
270
282
```
283
+
271
284
<!--
272
- 3 . My Windows Pods cannot launch because of missing `/run/flannel/subnet.env`
285
+ 1 . My Windows Pods cannot launch because of missing `/run/flannel/subnet.env`
273
286
274
287
This indicates that Flannel didn't launch correctly. You can either try
275
288
to restart `flanneld.exe` or you can copy the files over manually from
276
289
`/run/flannel/subnet.env` on the Kubernetes master to `C:\run\flannel\subnet.env`
277
290
on the Windows worker node and modify the `FLANNEL_SUBNET` row to a different
278
291
number. For example, if node subnet 10.244.4.1/24 is desired:
279
-
280
- ```env
281
- FLANNEL_NETWORK=10.244.0.0/16
282
- FLANNEL_SUBNET=10.244.4.1/24
283
- FLANNEL_MTU=1500
284
- FLANNEL_IPMASQ=true
285
- ```
286
292
-->
287
293
3 . 我的 Windows Pod 无法启动,因为缺少 ` /run/flannel/subnet.env `
288
294
@@ -312,3 +318,4 @@ If these steps don't resolve your problem, you can get help running Windows cont
312
318
* StackOverflow [ Windows Server Container] ( https://stackoverflow.com/questions/tagged/windows-server-container ) topic
313
319
* Kubernetes 官方论坛 [ discuss.kubernetes.io] ( https://discuss.kubernetes.io/ )
314
320
* Kubernetes Slack [ #SIG-Windows Channel] ( https://kubernetes.slack.com/messages/sig-windows )
321
+
0 commit comments