Skip to content

Commit fd61ea3

Browse files
german1608German Robayo Pazchasewilsonprzlplx
authored
AB#8189: asm: add external TSG for Istio Add-On CNI (#10103)
* asm: add external TSG for Istio Add-On CNI * keep consistency for prerequisites * add references * reword overview section * apply PR feedback * Apply suggestions from code review of chase Co-authored-by: Chase Wilson <[email protected]> * link to istio CNI addon docs * remove Overview section * Remove v-weizhu * Update istio-add-on-cni-troubleshooting.md Edit Review per CI 8189 in progress * Update istio-add-on-cni-troubleshooting.md Edit review per CI 8189 * Update istio-add-on-cni-troubleshooting.md * fix link * Fix markdown formatting in Istio CNI troubleshooting guide Updated markdown formatting for Istio CNI troubleshooting guide. * Update istio-add-on-cni-troubleshooting.md * Update istio-add-on-cni-troubleshooting.md --------- Co-authored-by: German Robayo Paz <[email protected]> Co-authored-by: Chase Wilson <[email protected]> Co-authored-by: Jerry Sitser <[email protected]>
1 parent 2eeb0bd commit fd61ea3

File tree

2 files changed

+164
-22
lines changed

2 files changed

+164
-22
lines changed
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
title: Istio Service Mesh Add-on CNI Troubleshooting
3+
description: Learn how to troubleshoot the Istio CNI add-on for Azure Kubernetes Service (AKS).
4+
ms.date: 10/22/2025
5+
ms.reviewer: gerobayopaz, kochhars
6+
ms.service: azure-kubernetes-service
7+
ms.topic: troubleshooting-general
8+
ms.custom: sap:Extensions, Policies and Add-Ons
9+
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the Istio CNI add-on so that I can use the Istio service mesh successfully.
10+
---
11+
# Istio service mesh add-on CNI troubleshooting
12+
13+
This article discusses how to troubleshoot issues that affect the [Istio CNI][istio-cni-addon] feature for the Istio service mesh add-on for Azure Kubernetes Service (AKS).
14+
15+
## Prerequisites
16+
17+
- [Azure CLI](/cli/azure/install-azure-cli)
18+
- The `aks-preview` Azure CLI extension version 19.0.0b5 or later:
19+
20+
```azurecli-interactive
21+
az extension add --name aks-preview
22+
```
23+
24+
If you already have the `aks-preview` extension installed, update it to the latest version:
25+
26+
```azurecli-interactive
27+
az extension update --name aks-preview
28+
```
29+
- The Kubernetes [kubectl](https://kubernetes.io/docs/reference/kubectl/overview/) CLI, or a similar tool to connect to the cluster
30+
31+
Alternatively, you can install kubectl using Azure CLI by using the [az aks install-cli](/cli/azure/aks#az-aks-install-cli) command.
32+
- Make sure that your Istio service mesh uses revision `asm-1-25` or a later version. To check the current revision, run the following command:
33+
34+
```azurecli-interactive
35+
az aks show --resource-group <resource-group-name> --name <cluster-name> --query 'serviceMeshProfile.istio.revisions'
36+
```
37+
- Make sure that Istio CNI is enabled in your cluster:
38+
39+
```azurecli-interactive
40+
az aks show --resource-group <resource-group-name> --name <cluster-name> --query "serviceMeshProfile.istio.components.proxyRedirectioMechanism" -o table
41+
```
42+
43+
The command output should be `CNIChaining`. If the add-on isn't enabled, refer to [this guide to Istio CNI](/azure/aks/istio-cni).
44+
45+
## CNI DaemonSet provisioning issues troubleshooting
46+
47+
### Step 1: Verify that the Istio CNI DaemonSet is provisioned and ready
48+
49+
Check that the CNI DaemonSet is deployed and that all pods are running:
50+
51+
```bash
52+
kubectl get daemonset azure-service-mesh-istio-cni-addon-node -n aks-istio-system
53+
kubectl get pods -n aks-istio-system -l k8s-app=azure-service-mesh-istio-cni-addon-node
54+
```
55+
56+
You should see a DaemonSet that has pods that run on each node in your cluster.
57+
58+
### Step 2: Check CNI DaemonSet pod logs
59+
60+
To identify any installation or configuration issues, examine the logs of the CNI DaemonSet pods:
61+
62+
```bash
63+
kubectl logs -n aks-istio-system -l k8s-app=azure-service-mesh-istio-cni-addon-node
64+
```
65+
66+
Look for error messages that are related to the following items:
67+
- CNI plugin installation failures
68+
- Network configuration errors
69+
- Permission or file system issues
70+
- Node-specific issues
71+
72+
### Step 3: Check for node taints and tolerations
73+
74+
Verify that the CNI DaemonSet can be scheduled on all nodes:
75+
76+
```bash
77+
kubectl describe daemonset istio-cni-node -n aks-istio-system
78+
kubectl get nodes -o wide
79+
kubectl describe nodes
80+
```
81+
82+
Look for node taints that might prevent CNI pod scheduling, and make sure that the DaemonSet has appropriate tolerations.
83+
84+
## Init container injection issues troubleshooting
85+
### Step 1: Check whether istio-init containers are still injected
86+
87+
For newly created pods in the mesh, verify that `istio-init` containers are no longer present:
88+
89+
```bash
90+
# Check a sample pod in an injected namespace
91+
kubectl get pods -n $NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.initContainers[*].name}{"\n"}{end}'
92+
```
93+
94+
If `istio-init` containers are still being injected, CNI isn't working correctly.
95+
96+
### Step 2: Inspect pod events for CNI-related errors
97+
98+
Check pod events for any CNI plugin failures during pod startup:
99+
100+
```bash
101+
kubectl describe pod $POD_NAME -n $NAMESPACE
102+
```
103+
104+
Look for events that are related to the following items:
105+
- Network setup failures
106+
- CNI plugin errors
107+
- Container creation issues
108+
109+
### Step 3: Verify istio-proxy sidecar startup
110+
111+
Check whether the `istio-proxy` sidecar container starts successfully without the init container:
112+
113+
```bash
114+
kubectl logs $POD_NAME -n $NAMESPACE -c istio-proxy
115+
```
116+
117+
If CNI is working correctly, the sidecar should start normally even without the `istio-init` container.
118+
119+
## Pod startup failure troubleshooting
120+
121+
If pods don't start, check for `istio-validation` init container errors:
122+
123+
```bash
124+
kubectl logs $POD_NAME -n $NAMESPACE -c istio-validation
125+
```
126+
127+
Look for "connection refused" error messages that indicate failures in traffic redirection setup.
128+
129+
## References
130+
131+
[Open-source Istio's CNI troubleshooting](https://istio.io/latest/docs/ops/diagnostic-tools/cni/)
132+
133+
[!INCLUDE [Third-party information disclaimer](../../../includes/third-party-disclaimer.md)]
134+
135+
[!INCLUDE [Third-party contact disclaimer](../../../includes/third-party-contact-disclaimer.md)]
136+
137+
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
138+
139+
[istio-deploy-addon]: /azure/aks/istio-deploy-addon
140+
[istio-cni-addon]: /azure/aks/istio-cni

support/azure/azure-kubernetes/toc.yml

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -64,14 +64,14 @@ items:
6464
- name: Identify nodes and containers utilizing high CPU
6565
href: availability-performance/identify-high-cpu-consuming-containers-aks.md
6666
- name: Improve container image pull performance in AKS
67-
href: availability-performance/container-image-pull-performance.md
67+
href: availability-performance/container-image-pull-performance.md
6868
- name: Troubleshoot cluster service health probe mode issues
6969
href: availability-performance/cluster-service-health-probe-mode-issues.md
7070
- name: Troubleshoot high memory consumption in disk-intensive applications
7171
href: availability-performance/high-memory-consumption-disk-intensive-applications.md
7272
- name: Troubleshoot OOMkilled in AKS clusters
73-
href: availability-performance/troubleshoot-oomkilled-aks-clusters.md
74-
- name: Troubleshoot pod scheduler errors
73+
href: availability-performance/troubleshoot-oomkilled-aks-clusters.md
74+
- name: Troubleshoot pod scheduler errors
7575
href: availability-performance/troubleshoot-pod-scheduler-errors.md
7676
- name: Troubleshoot node not ready
7777
items:
@@ -106,7 +106,7 @@ items:
106106
- name: Basic troubleshooting
107107
href: connectivity/troubleshoot-cluster-connection-issues-api-server.md
108108
- name: Can't access the cluster API server using authorized IP ranges
109-
href: connectivity/cannot-access-cluster-api-server-using-authorized-ip-ranges.md
109+
href: connectivity/cannot-access-cluster-api-server-using-authorized-ip-ranges.md
110110
- name: Client IP address can't access API server
111111
href: connectivity/client-ip-address-cannot-access-api-server.md
112112
- name: Config file isn't available when connecting
@@ -130,7 +130,7 @@ items:
130130
- name: Can't connect to endpoints outside virtual network (public internet)
131131
href: connectivity/troubleshoot-connections-endpoints-outside-virtual-network.md
132132
- name: Can't connect to pods and services in same cluster
133-
href: connectivity/troubleshoot-connection-pods-services-same-cluster.md
133+
href: connectivity/troubleshoot-connection-pods-services-same-cluster.md
134134
- name: Can't view resources in Kubernetes resource viewer on Azure portal
135135
href: connectivity/cannot-view-resources-kubernetes-resource-viewer-portal.md
136136
- name: TCP times out when kubectl or other third-party tools connect
@@ -169,12 +169,12 @@ items:
169169
href: storage/troubleshoot-common-bring-your-own-key-issues.md
170170
- name: Troubleshoot pods and namespaces stuck in the Terminating state
171171
href: storage/pods-namespaces-terminating-state.md
172-
- name: Mount failures
172+
- name: Mount failures
173173
items:
174174
- name: Can't set the uid and gid mounting options on an Azure Disk
175175
href: storage/failure-setting-azure-disk-mount-options-uid-gid.md
176176
- name: Errors when mounting Blob container
177-
href: storage/mounting-azure-blob-storage-container-fail.md
177+
href: storage/mounting-azure-blob-storage-container-fail.md
178178
- name: Errors when mounting Disk volumes
179179
href: storage/fail-to-mount-azure-disk-volume.md
180180
- name: Errors when mounting File share
@@ -233,12 +233,14 @@ items:
233233
href: extensions/istio-add-on-minor-revision-upgrade.md
234234
- name: Istio add-on plug-in CA certificate troubleshooting
235235
href: extensions/istio-add-on-plug-in-ca-certificate.md
236+
- name: Istio add-on CNI troubleshooting
237+
href: extensions/istio-add-on-cni-troubleshooting.md
236238
- name: Other extensions and add-ons
237239
items:
238240
- name: AKS Cost Analysis add-on issues
239241
href: extensions/aks-cost-analysis-add-on-issues.md
240242
- name: Can't pull images from container registry to cluster
241-
href: extensions/cannot-pull-image-from-acr-to-aks-cluster.md
243+
href: extensions/cannot-pull-image-from-acr-to-aks-cluster.md
242244
- name: Troubleshoot AI toolchain operator add-on errors
243245
href: extensions/troubleshoot-ai-toolchain-operator-addon-issues.md
244246
- name: Troubleshoot Azure Key Vault Provider for Secrets Store CSI Driver
@@ -248,7 +250,7 @@ items:
248250
- name: Troubleshoot Dapr extension installation errors
249251
href: extensions/troubleshoot-dapr-extension-installation-errors.md
250252
- name: Troubleshoot deployment failures of Azure Marketplace offers
251-
href: extensions/troubleshoot-failed-kubernetes-deployment-offer.md
253+
href: extensions/troubleshoot-failed-kubernetes-deployment-offer.md
252254
- name: Troubleshoot installation errors in the Azure App Configuration extension
253255
href: extensions/troubleshoot-app-configuration-extension-installation-errors.md
254256
- name: Troubleshoot managed namespaces
@@ -260,8 +262,8 @@ items:
260262
- name: Breaking changes in KEDA add-on 2.15 and 2.14
261263
href: extensions/changes-in-kubernetes-event-driven-autoscaling-add-on-214-215.md
262264
- name: Troubleshoot KEDA add-on
263-
href: extensions/troubleshoot-kubernetes-event-driven-autoscaling-add-on.md
264-
- name: Troubleshoot by error codes
265+
href: extensions/troubleshoot-kubernetes-event-driven-autoscaling-add-on.md
266+
- name: Troubleshoot error codes
265267
items:
266268
- name: Common error codes
267269
href: error-codes/aks-error-code-page.md
@@ -294,9 +296,9 @@ items:
294296
- name: CreateOrUpdateVirtualNetworkLinkFailed error
295297
href: error-codes/createorupdatevirtualnetworklinkfailed-error.md
296298
- name: CustomPrivateDNSZoneMissingPermissionError error
297-
href: error-codes/customprivatednszonemissingpermissions-error.md
299+
href: error-codes/customprivatednszonemissingpermissions-error.md
298300
- name: DnsServiceIpOutOfServiceCidr error
299-
href: error-codes/dnsserviceipoutofservicecidr-error.md
301+
href: error-codes/dnsserviceipoutofservicecidr-error.md
300302
- name: ERR_VHD_FILE_NOT_FOUND error (65)
301303
href: create-upgrade-delete/error-code-vhdfilenotfound.md
302304
- name: Error "CreateOrUpdateVirtualNetworkLinkFailed"
@@ -322,7 +324,7 @@ items:
322324
- name: Known issues - Custom kubelet configuration on Windows
323325
href: create-upgrade-delete/known-issues-custom-kubelet-configuration.md
324326
- name: L-S
325-
items:
327+
items:
326328
- name: LB/PvtLinkSvcWithPvtEndptConn deletion error
327329
href: create-upgrade-delete/cannot-delete-load-balancer-private-link-service.md
328330
- name: LinkedAuthorizationFailed error
@@ -332,23 +334,23 @@ items:
332334
- name: LoadBalancerInUseByVirtualMachineScaleSet or NetworkSecurityGroupInUseByVirtualMachineScaleSet error
333335
href: error-codes/networksecuritygroupinusebyvirtualmachinescaleset-error.md
334336
- name: Missing or invalid service principal
335-
href: create-upgrade-delete/missing-or-invalid-service-principal.md
337+
href: create-upgrade-delete/missing-or-invalid-service-principal.md
336338
- name: MissingSubscriptionRegistration error
337339
href: error-codes/missingsubscriptionregistration-error.md
338340
- name: NetworkSecurityGroupInUseByVirtualMachineScaleSet error
339341
href: create-upgrade-delete/load-balancer-or-nsg-in-use-by-vm-scale-set.md
340342
- name: NodePoolMcVersionIncompatible error
341-
href: error-codes/nodepoolmcversionincompatible-error.md
343+
href: error-codes/nodepoolmcversionincompatible-error.md
342344
- name: OperationIsNotAllowed errors
343345
href: create-upgrade-delete/operationnotallowed.md
344346
- name: OperationNotAllowed or PublicIPCountLimitReached error
345-
href: error-codes/operationnotallowed-publicipcountlimitreached-error.md
347+
href: error-codes/operationnotallowed-publicipcountlimitreached-error.md
346348
- name: OrasPullNetworkTimeoutVMExtensionError
347349
href: error-codes/vmextensionerror-oraspullnetworktimeout.md
348350
- name: OrasPullUnauthorizedVMExtensionError
349351
href: error-codes/vmextensionerror-oraspullunauthorized.md
350352
- name: OutboundConnFailVMExtensionError error (50)
351-
href: create-upgrade-delete/error-code-outboundconnfailvmextensionerror.md
353+
href: create-upgrade-delete/error-code-outboundconnfailvmextensionerror.md
352354
- name: PublicIPAddr/InUseSubnet/NetSecGrp deletion error
353355
href: create-upgrade-delete/cannot-delete-ip-subnet-nsg.md
354356
- name: PublicIPAddressCannotBeDeleted, InUseSubnetCannotBeDeleted, or InUseNetworkSecurityGroupCannotBeDeleted error
@@ -360,7 +362,7 @@ items:
360362
- name: QuotaExceeded or InsufficientVCPUQuota error during creation or upgrade
361363
href: create-upgrade-delete/quota-exceeded-during-creation-upgrade.md
362364
- name: RequestDisallowedByPolicy error
363-
href: error-codes/requestdisallowedbypolicy-error.md
365+
href: error-codes/requestdisallowedbypolicy-error.md
364366
- name: ServiceCidrOverlapExistingSubnetsCidr error
365367
href: error-codes/servicecidroverlapexistingsubnetscidr-error.md
366368
- name: ServicePrincipalValidationClientError error
@@ -377,8 +379,8 @@ items:
377379
href: error-codes/subscriptionrequeststhrottled.md
378380
- name: SubscriptionRequestsThrottled error (429)
379381
href: create-upgrade-delete/error-code-subscriptionrequeststhrottled.md
380-
- name: T-Z
381-
items:
382+
- name: T-Z
383+
items:
382384
- name: TCP time-outs such as 10250 I/O
383385
href: connectivity/tcp-timeouts-dial-tcp-nodeip-10250-io-timeout.md
384386
- name: Throttled error
@@ -395,7 +397,7 @@ items:
395397
href: error-codes/unsatisfiablepdb-error.md
396398
- name: Upgrade issues with Gen2 VMs on Windows AKS cluster
397399
href: create-upgrade-delete/nodepools-not-upgraded-to-gen2-during-node-image-upgrade.md
398-
- name: VirtualNetworkNotInSucceededState error
400+
- name: VirtualNetworkNotInSucceededState error
399401
href: error-codes/virtualnetworknotinsucceededstate-error.md
400402
- name: VMExtensionError_CniDownloadTimeout error
401403
href: error-codes/vmextensionerror-cnidownloadtimeout.md

0 commit comments

Comments
 (0)