|
| 1 | +--- |
| 2 | +title: Azure Kubernetes Service (AKS) Diagnostics Overview |
| 3 | +description: Learn about self-diagnosing clusters in Azure Kubernetes Service. |
| 4 | +services: container-service |
| 5 | +ms.topic: conceptual |
| 6 | +ms.date: 11/15/2022 |
| 7 | +--- |
| 8 | + |
| 9 | +# Azure Kubernetes Service Diagnostics (preview) overview |
| 10 | + |
| 11 | +Troubleshooting Azure Kubernetes Service (AKS) cluster issues plays an important role in maintaining your cluster, especially if your cluster is running mission-critical workloads. AKS Diagnostics (preview) is an intelligent, self-diagnostic experience that: |
| 12 | + |
| 13 | +* Helps you identify and resolve problems in your cluster. |
| 14 | +* Is cloud-native. |
| 15 | +* Requires no extra configuration or billing cost. |
| 16 | + |
| 17 | +[!INCLUDE [preview features callout](./includes/preview/preview-callout.md)] |
| 18 | + |
| 19 | +## Open AKS Diagnostics |
| 20 | + |
| 21 | +To access AKS Diagnostics: |
| 22 | + |
| 23 | +1. Sign in to the [Azure portal](https://portal.azure.com) |
| 24 | +1. From **All services** in the Azure portal, select **Kubernetes Service**. |
| 25 | +1. Select **Diagnose and solve problems** in the left navigation, which opens AKS Diagnostics. |
| 26 | +1. Choose a category that best describes the issue of your cluster, like _Cluster Node Issues_, by: |
| 27 | + |
| 28 | + * Using the keywords in the homepage tile. |
| 29 | + * Typing a keyword that best describes your issue in the search bar. |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +## View a diagnostic report |
| 34 | + |
| 35 | +After you click on a category, you can view a diagnostic report specific to your cluster. Diagnostic reports intelligently call out any issues in your cluster with status icons. You can drill down on each topic by clicking **More Info** to see a detailed description of: |
| 36 | + |
| 37 | +* Issues |
| 38 | +* Recommended actions |
| 39 | +* Links to helpful docs |
| 40 | +* Related-metrics |
| 41 | +* Logging data |
| 42 | + |
| 43 | +Diagnostic reports generate based on the current state of your cluster after running various checks. They can be useful for pinpointing the problem of your cluster and understanding next steps to resolve the issue. |
| 44 | + |
| 45 | + |
| 46 | + |
| 47 | + |
| 48 | + |
| 49 | +## Cluster insights |
| 50 | + |
| 51 | +The following diagnostic checks are available in **Cluster Insights**. |
| 52 | + |
| 53 | +### Cluster Node Issues |
| 54 | + |
| 55 | +Cluster Node Issues checks for node-related issues that cause your cluster to behave unexpectedly. Specifically: |
| 56 | + |
| 57 | +- Node readiness issues |
| 58 | +- Node failures |
| 59 | +- Insufficient resources |
| 60 | +- Node missing IP configuration |
| 61 | +- Node CNI failures |
| 62 | +- Node not found |
| 63 | +- Node power off |
| 64 | +- Node authentication failure |
| 65 | +- Node kube-proxy stale |
| 66 | + |
| 67 | +### Create, read, update & delete (CRUD) operations |
| 68 | + |
| 69 | +CRUD Operations checks for any CRUD operations that cause issues in your cluster. Specifically: |
| 70 | + |
| 71 | +- In-use subnet delete operation error |
| 72 | +- Network security group delete operation error |
| 73 | +- In-use route table delete operation error |
| 74 | +- Referenced resource provisioning error |
| 75 | +- Public IP address delete operation error |
| 76 | +- Deployment failure due to deployment quota |
| 77 | +- Operation error due to organization policy |
| 78 | +- Missing subscription registration |
| 79 | +- VM extension provisioning error |
| 80 | +- Subnet capacity |
| 81 | +- Quota exceeded error |
| 82 | + |
| 83 | +### Identity and security management |
| 84 | + |
| 85 | +Identity and Security Management detects authentication and authorization errors that prevent communication to your cluster. Specifically, |
| 86 | + |
| 87 | +- Node authorization failures |
| 88 | +- 401 errors |
| 89 | +- 403 errors |
| 90 | + |
| 91 | +## Next steps |
| 92 | + |
| 93 | +* Collect logs to help you further troubleshoot your cluster issues by using [AKS Periscope](https://aka.ms/aksperiscope). |
| 94 | + |
| 95 | +* Read the [triage practices section](/azure/architecture/operator-guides/aks/aks-triage-practices) of the AKS day-2 operations guide. |
| 96 | + |
| 97 | +* Post your questions or feedback at [UserVoice](https://feedback.azure.com/d365community/forum/aabe212a-f724-ec11-b6e6-000d3a4f0da0) by adding "[Diag]" in the title. |
0 commit comments