|
| 1 | +--- |
| 2 | +title: Windows 调试小技巧 |
| 3 | +content_type: concept |
| 4 | +--- |
| 5 | +<!-- |
| 6 | +title: Windows debugging tips |
| 7 | +content_type: concept |
| 8 | +--> |
| 9 | +<!-- overview --> |
| 10 | + |
| 11 | +<!-- body --> |
| 12 | +<!-- |
| 13 | +## Node-level troubleshooting {#troubleshooting-node} |
| 14 | +
|
| 15 | +1. My Pods are stuck at "Container Creating" or restarting over and over |
| 16 | +
|
| 17 | + Ensure that your pause image is compatible with your Windows OS version. |
| 18 | + See [Pause container](/docs/setup/production-environment/windows/intro-windows-in-kubernetes#pause-container) |
| 19 | + to see the latest / recommended pause image and/or get more information. |
| 20 | +
|
| 21 | + {{< note >}} |
| 22 | + If using containerd as your container runtime the pause image is specified in the |
| 23 | + `plugins.plugins.cri.sandbox_image` field of the of config.toml configration file. |
| 24 | + {{< /note >}} |
| 25 | +--> |
| 26 | +## 工作节点级别排障 {#troubleshooting-node} |
| 27 | + |
| 28 | +1. 我的 Pod 都卡在 “Container Creating” 或者不断重启 |
| 29 | + |
| 30 | + 确保你的 pause 镜像跟你的 Windows 版本兼容。 |
| 31 | + 查看 [Pause 容器](zh/docs/setup/production-environment/windows/intro-windows-in-kubernetes#pause-container) |
| 32 | + 以了解最新的或建议的 pause 镜像,或者了解更多信息。 |
| 33 | + |
| 34 | + {{< note >}} |
| 35 | + 如果你使用了 containerd 作为你的容器运行时,pause 镜像在 config.toml 配置文件的 |
| 36 | + `plugins.plugins.cri.sandbox_image` 中指定。 |
| 37 | + {{< /note >}} |
| 38 | +<!-- |
| 39 | +2. My pods show status as `ErrImgPull` or `ImagePullBackOff` |
| 40 | +
|
| 41 | + Ensure that your Pod is getting scheduled to a [compatable](https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility) Windows Node. |
| 42 | +
|
| 43 | + More information on how to specify a compatable node for your Pod can be found in [this guide](docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host). |
| 44 | +--> |
| 45 | +2. 我的 pod 状态显示 'ErrImgPull' 或者 ‘ImagePullBackOff’ |
| 46 | + |
| 47 | + 保证你的 Pod 被调度到[兼容的](https://docs.microsoft.com/virtualization/windowscontainers/deploy-containers/version-compatibility) Windows 节点上。 |
| 48 | + |
| 49 | + 关于如何为你的 Pod 指定一个兼容节点, |
| 50 | + 的更多信息可以查看这个指可以查看[这个指南](/zhdocs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host)以了解更多的信息。 |
| 51 | +<!-- |
| 52 | +## Network troubleshooting {#troubleshooting-network} |
| 53 | +
|
| 54 | +1. My Windows Pods do not have network connectivity |
| 55 | +
|
| 56 | + If you are using virtual machines, ensure that MAC spoofing is **enabled** on all |
| 57 | + the VM network adapter(s). |
| 58 | +--> |
| 59 | +## 网络排障 {#troubleshooting-network} |
| 60 | + |
| 61 | +1. 我的 Windows Pod 没有网络连接 |
| 62 | + |
| 63 | + 如果你使用的是虚拟机,请确保所有 VM 网卡上都已启用 MAC spoofing。 |
| 64 | +<!-- |
| 65 | +2. My Windows Pods cannot ping external resources |
| 66 | +
|
| 67 | + Windows Pods do not have outbound rules programmed for the ICMP protocol. However, |
| 68 | + TCP/UDP is supported. When trying to demonstrate connectivity to resources |
| 69 | + outside of the cluster, substitute `ping <IP>` with corresponding |
| 70 | + `curl <IP>` commands. |
| 71 | +
|
| 72 | + If you are still facing problems, most likely your network configuration in |
| 73 | + [cni.conf](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf) |
| 74 | + deserves some extra attention. You can always edit this static file. The |
| 75 | + configuration update will apply to any new Kubernetes resources. |
| 76 | +
|
| 77 | + One of the Kubernetes networking requirements |
| 78 | + (see [Kubernetes model](/docs/concepts/cluster-administration/networking/)) is |
| 79 | + for cluster communication to occur without |
| 80 | + NAT internally. To honor this requirement, there is an |
| 81 | + [ExceptionList](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf#L20) |
| 82 | + for all the communication where you do not want outbound NAT to occur. However, |
| 83 | + this also means that you need to exclude the external IP you are trying to query |
| 84 | + from the `ExceptionList`. Only then will the traffic originating from your Windows |
| 85 | + pods be SNAT'ed correctly to receive a response from the outside world. In this |
| 86 | + regard, your `ExceptionList` in `cni.conf` should look as follows: |
| 87 | +
|
| 88 | + ```conf |
| 89 | + "ExceptionList": [ |
| 90 | + "10.244.0.0/16", # Cluster subnet |
| 91 | + "10.96.0.0/12", # Service subnet |
| 92 | + "10.127.130.0/24" # Management (host) subnet |
| 93 | + ] |
| 94 | + ``` |
| 95 | +--> |
| 96 | +2. 我的 Windows Pod 不能 ping 通外界资源 |
| 97 | + |
| 98 | + Windows Pod 没有为 ICMP 协议编写出站规则,但 TCP/UDP 是支持的。当试图演示与集群外部资源的连接时,可以把 `ping <IP>` 替换为 `curl <IP>` 命令。 |
| 99 | + |
| 100 | + 如果你仍然遇到问题,很可能你需要额外关注 |
| 101 | + [cni.conf](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf) |
| 102 | + 的配置。你可以随时编辑这个静态文件。更新配置将应用于新的 Kubernetes 资源。 |
| 103 | + |
| 104 | + Kubernetes 的网络需求之一 (查看 [Kubernetes 模型](/zh/docs/concepts/cluster-administration/networking/)) |
| 105 | + 是集群通信不需要内部的 NAT。 |
| 106 | + 为了遵守这一要求, 对于你不希望发生的出站 NAT 通信,这里有一个 |
| 107 | + [ExceptionList](https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/cni/config/cni.conf#L20) 。 |
| 108 | + 然而,这也意味着你需要从 `ExceptionList` 中去掉你试图查询的外部IP。 |
| 109 | + 只有这样,来自你的 Windows Pod 的流量才会被正确地 SNAT 转换,以接收来自外部环境的响应。 |
| 110 | + 就此而言,你的 `cni.conf` 中的 `ExceptionList` 应该如下所示: |
| 111 | + |
| 112 | + ```conf |
| 113 | + "ExceptionList": [ |
| 114 | + "10.244.0.0/16", # Cluster subnet |
| 115 | + "10.96.0.0/12", # Service subnet |
| 116 | + "10.127.130.0/24" # Management (host) subnet |
| 117 | + ] |
| 118 | + ``` |
| 119 | +<!-- |
| 120 | +3. My Windows node cannot access `NodePort` type Services |
| 121 | +
|
| 122 | + Local NodePort access from the node itself fails. This is a known |
| 123 | + limitation. NodePort access works from other nodes or external clients. |
| 124 | +
|
| 125 | +4. vNICs and HNS endpoints of containers are being deleted |
| 126 | +
|
| 127 | + This issue can be caused when the `hostname-override` parameter is not passed to |
| 128 | + [kube-proxy](/docs/reference/command-line-tools-reference/kube-proxy/). To resolve |
| 129 | + it, users need to pass the hostname to kube-proxy as follows: |
| 130 | +
|
| 131 | + ```powershell |
| 132 | + C:\k\kube-proxy.exe --hostname-override=$(hostname) |
| 133 | + ``` |
| 134 | +--> |
| 135 | +3. 我的 Windows 节点无法访问 `NodePort` 类型服务 |
| 136 | + |
| 137 | + 从节点本身访问本地 NodePort 失败,是一个已知的限制。你可以从其他节点或外部客户端正常访问 NodePort。 |
| 138 | + |
| 139 | +4. 容器的 vnic 和 HNS endpoints 正在被删除 |
| 140 | + |
| 141 | + 当 `hostname-override` 参数没有传递给 [kube-proxy](/zh/docs/reference/command-line-tools-reference/kube-proxy/) |
| 142 | + 时可能引发这一问题。想要解决这个问题,用户需要将主机名传递给 kube-proxy,如下所示: |
| 143 | + |
| 144 | + ```powershell |
| 145 | + C:\k\kube-proxy.exe --hostname-override=$(hostname) |
| 146 | + ``` |
| 147 | +<!-- |
| 148 | +5. My Windows node cannot access my services using the service IP |
| 149 | +
|
| 150 | + This is a known limitation of the networking stack on Windows. However, Windows Pods can access the Service IP. |
| 151 | +
|
| 152 | +6. No network adapter is found when starting the kubelet |
| 153 | +
|
| 154 | + The Windows networking stack needs a virtual adapter for Kubernetes networking to work. |
| 155 | + If the following commands return no results (in an admin shell), |
| 156 | + virtual network creation — a necessary prerequisite for the kubelet to work — has failed: |
| 157 | +
|
| 158 | + ```powershell |
| 159 | + Get-HnsNetwork | ? Name -ieq "cbr0" |
| 160 | + Get-NetAdapter | ? Name -Like "vEthernet (Ethernet*" |
| 161 | + ``` |
| 162 | +
|
| 163 | + Often it is worthwhile to modify the [InterfaceName](https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7) parameter of the start.ps1 script, |
| 164 | + in cases where the host's network adapter isn't "Ethernet". |
| 165 | + Otherwise, consult the output of the `start-kubelet.ps1` script to see if there are errors during virtual network creation. |
| 166 | +--> |
| 167 | +5. 我的 Windows 节点无法通过服务 IP 访问我的服务 |
| 168 | + |
| 169 | + 这是 Windows 上网络栈的一个已知限制。但是 Windows Pod 可以访问 Service IP。 |
| 170 | + |
| 171 | +6. 启动 kubelet 时找不到网络适配器 |
| 172 | + |
| 173 | + Windows 网络栈需要一个虚拟适配器才能使 Kubernetes 网络工作。 |
| 174 | + 如果以下命令没有返回结果(在管理员模式的 shell 中), |
| 175 | + 则意味着创建虚拟网络失败,而虚拟网络的存在是 kubelet 正常工作前提: |
| 176 | + |
| 177 | + ```powershell |
| 178 | + Get-HnsNetwork | ? Name -ieq "cbr0" |
| 179 | + Get-NetAdapter | ? Name -Like "vEthernet (Ethernet*" |
| 180 | + ``` |
| 181 | + |
| 182 | + 如果主机的网络适配器不是 "Ethernet",通常有必要修改 `start.ps1` 脚本的 |
| 183 | + [InterfaceName](https://github.com/microsoft/SDN/blob/master/Kubernetes/flannel/start.ps1#L7) 参数。 |
| 184 | + 否则,如果虚拟网络创建过程出错,请检查 `start-kubelet.ps1` 脚本的输出。 |
| 185 | +<!-- |
| 186 | +7. DNS resolution is not properly working |
| 187 | +
|
| 188 | + Check the DNS limitations for Windows in this [section](#dns-limitations). |
| 189 | +
|
| 190 | +8. `kubectl port-forward` fails with "unable to do port forwarding: wincat not found" |
| 191 | +
|
| 192 | + This was implemented in Kubernetes 1.15 by including `wincat.exe` in the pause infrastructure container `mcr.microsoft.com/oss/kubernetes/pause:3.6`. |
| 193 | + Be sure to use a supported version of Kubernetes. |
| 194 | + If you would like to build your own pause infrastructure container be sure to include [wincat](https://github.com/kubernetes/kubernetes/tree/master/build/pause/windows/wincat). |
| 195 | +--> |
| 196 | +7. DNS 解析工作异常 |
| 197 | + |
| 198 | + 在[本节](#dns-limitations)中了解 Windows 系统上的 DNS 限制。 |
| 199 | + |
| 200 | +8. `kubectl port-forward` 失败,错误为 "unable to do port forwarding: wincat not found" |
| 201 | + |
| 202 | + 在 Kubernetes 1.15 中,pause 基础架构容器 `mcr.microsoft.com/oss/kubernetes/pause:3.6` |
| 203 | + 中包含 `wincat.exe` 来实现端口转发。 |
| 204 | + 请确保使用 Kubernetes 的受支持版本。如果你想构建自己的 pause 基础架构容器, |
| 205 | + 请确保其中包含 [wincat](https://github.com/kubernetes/kubernetes/tree/master/build/pause/windows/wincat)。 |
| 206 | +<!-- |
| 207 | +9. My Kubernetes installation is failing because my Windows Server node is behind a proxy |
| 208 | +
|
| 209 | + If you are behind a proxy, the following PowerShell environment variables must be defined: |
| 210 | +
|
| 211 | + ```PowerShell |
| 212 | + [Environment]::SetEnvironmentVariable("HTTP_PROXY", "http://proxy.example.com:80/", [EnvironmentVariableTarget]::Machine) |
| 213 | + [Environment]::SetEnvironmentVariable("HTTPS_PROXY", "http://proxy.example.com:443/", [EnvironmentVariableTarget]::Machine) |
| 214 | + ``` |
| 215 | +--> |
| 216 | +9. 我的 Kubernetes 安装失败,因为我的 Windows 服务器节点使用了代理服务器 |
| 217 | + |
| 218 | + 如果使用了代理服务器,必须定义下面的 PowerShell 环境变量: |
| 219 | + |
| 220 | + ```PowerShell |
| 221 | + [Environment]::SetEnvironmentVariable("HTTP_PROXY", "http://proxy.example.com:80/", [EnvironmentVariableTarget]::Machine) |
| 222 | + [Environment]::SetEnvironmentVariable("HTTPS_PROXY", "http://proxy.example.com:443/", [EnvironmentVariableTarget]::Machine) |
| 223 | + ``` |
| 224 | +<!-- |
| 225 | +### Flannel troubleshooting |
| 226 | +
|
| 227 | +1. With Flannel, my nodes are having issues after rejoining a cluster |
| 228 | +
|
| 229 | + Whenever a previously deleted node is being re-joined to the cluster, flannelD |
| 230 | + tries to assign a new pod subnet to the node. Users should remove the old pod |
| 231 | + subnet configuration files in the following paths: |
| 232 | +
|
| 233 | + ```powershell |
| 234 | + Remove-Item C:\k\SourceVip.json |
| 235 | + Remove-Item C:\k\SourceVipRequest.json |
| 236 | + ``` |
| 237 | +--> |
| 238 | +## Flannel 故障排查 {#troubleshooting-network} |
| 239 | + |
| 240 | +1. 使用 Flannel 时,我的节点在重新加入集群后出现问题 |
| 241 | + |
| 242 | + 当先前删除的节点重新加入集群时, flannelD 尝试为节点分配一个新的 Pod 子网。 |
| 243 | + 用户应该在以下路径中删除旧的 Pod 子网配置文件: |
| 244 | + |
| 245 | + ```powershell |
| 246 | + Remove-Item C:\k\SourceVip.json |
| 247 | + Remove-Item C:\k\SourceVipRequest.json |
| 248 | + ``` |
| 249 | +<!-- |
| 250 | +2. Flanneld is stuck in "Waiting for the Network to be created" |
| 251 | +
|
| 252 | + There are numerous reports of this [issue](https://github.com/coreos/flannel/issues/1066); |
| 253 | + most likely it is a timing issue for when the management IP of the flannel network is set. |
| 254 | + A workaround is to relaunch `start.ps1` or relaunch it manually as follows: |
| 255 | +
|
| 256 | + ```powershell |
| 257 | + [Environment]::SetEnvironmentVariable("NODE_NAME", "<Windows_Worker_Hostname>") |
| 258 | + C:\flannel\flanneld.exe --kubeconfig-file=c:\k\config --iface=<Windows_Worker_Node_IP> --ip-masq=1 --kube-subnet-mgr=1 |
| 259 | + ``` |
| 260 | +--> |
| 261 | +2. Flanneld 卡在 "Waiting for the Network to be created" |
| 262 | + |
| 263 | + 关于这个[问题](https://github.com/coreos/flannel/issues/1066)有很多报告 ; |
| 264 | + 很可能是 flannel 网络管理 IP 的设置时机问题。 |
| 265 | + 一个变通方法是重新启动 `start.ps1` 或按如下方式手动重启: |
| 266 | + |
| 267 | + ```powershell |
| 268 | + [Environment]::SetEnvironmentVariable("NODE_NAME", "<Windows 工作节点主机名>") |
| 269 | + C:\flannel\flanneld.exe --kubeconfig-file=c:\k\config --iface=<Windows 工作节点 IP> --ip-masq=1 --kube-subnet-mgr=1 |
| 270 | + ``` |
| 271 | +<!-- |
| 272 | +3. My Windows Pods cannot launch because of missing `/run/flannel/subnet.env` |
| 273 | +
|
| 274 | + This indicates that Flannel didn't launch correctly. You can either try |
| 275 | + to restart `flanneld.exe` or you can copy the files over manually from |
| 276 | + `/run/flannel/subnet.env` on the Kubernetes master to `C:\run\flannel\subnet.env` |
| 277 | + on the Windows worker node and modify the `FLANNEL_SUBNET` row to a different |
| 278 | + number. For example, if node subnet 10.244.4.1/24 is desired: |
| 279 | +
|
| 280 | + ```env |
| 281 | + FLANNEL_NETWORK=10.244.0.0/16 |
| 282 | + FLANNEL_SUBNET=10.244.4.1/24 |
| 283 | + FLANNEL_MTU=1500 |
| 284 | + FLANNEL_IPMASQ=true |
| 285 | + ``` |
| 286 | +--> |
| 287 | +3. 我的 Windows Pod 无法启动,因为缺少 `/run/flannel/subnet.env` |
| 288 | + |
| 289 | + 这表明 Flannel 没有正确启动。你可以尝试重启`flanneld.exe` 或者你可以将 Kubernetes 控制节点的 |
| 290 | + `/run/flannel/subnet.env` 文件手动拷贝到 Windows 工作节点上,放在 `C:\run\flannel\subnet.env`; |
| 291 | + 并且将 `FLANNEL_SUBNET` 行修改为不同取值。例如,如果期望节点子网为 10.244.4.1/24: |
| 292 | + |
| 293 | + ```env |
| 294 | + FLANNEL_NETWORK=10.244.0.0/16 |
| 295 | + FLANNEL_SUBNET=10.244.4.1/24 |
| 296 | + FLANNEL_MTU=1500 |
| 297 | + FLANNEL_IPMASQ=true |
| 298 | + ``` |
| 299 | +<!-- |
| 300 | +### Further investigation |
| 301 | +
|
| 302 | +If these steps don't resolve your problem, you can get help running Windows containers on Windows nodes in Kubernetes through: |
| 303 | +
|
| 304 | +* StackOverflow [Windows Server Container](https://stackoverflow.com/questions/tagged/windows-server-container) topic |
| 305 | +* Kubernetes Official Forum [discuss.kubernetes.io](https://discuss.kubernetes.io/) |
| 306 | +* Kubernetes Slack [#SIG-Windows Channel](https://kubernetes.slack.com/messages/sig-windows) |
| 307 | +--> |
| 308 | +### 进一步探查 {#further-investigation} |
| 309 | + |
| 310 | +如果这些步骤都不能解决你的问题,你可以通过以下方式获得关于在 Kubernetes 中运行 Windows 容器的帮助: |
| 311 | + |
| 312 | +* StackOverflow [Windows Server Container](https://stackoverflow.com/questions/tagged/windows-server-container) topic |
| 313 | +* Kubernetes 官方论坛 [discuss.kubernetes.io](https://discuss.kubernetes.io/) |
| 314 | +* Kubernetes Slack [#SIG-Windows Channel](https://kubernetes.slack.com/messages/sig-windows) |
0 commit comments