Skip to content

Commit 3304bfc

Browse files
committed
Updated with the correct error code for AksCapacityHeavyUsage
2 parents 266d0ea + aa037a4 commit 3304bfc

File tree

449 files changed

+10672
-5740
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

449 files changed

+10672
-5740
lines changed

.openpublishing.redirection.json

Lines changed: 400 additions & 0 deletions
Large diffs are not rendered by default.

support/azure/.openpublishing.redirection.azure.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6295,6 +6295,18 @@
62956295
{
62966296
"source_path": "azure-kubernetes/error-codes/vhdfilenotfound.md",
62976297
"redirect_url": "/troubleshoot/azure/azure-kubernetes/error-codes/vmextensionerror-vhdfilenotfound"
6298+
},
6299+
{
6300+
"source_path": "virtual-machines/linux/linux-vm-no-boot-hyper-v-driver-issues.md",
6301+
"redirect_url": "/troubleshoot/azure/virtual-machines/linux/troubleshoot-lis-driver-issues-on-linux-vms"
6302+
},
6303+
{
6304+
"source_path": "azure-kubernetes/create-upgrade-delete/error-code-reservedresourcename.md",
6305+
"redirect_url": "/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-invalidparameter"
6306+
},
6307+
{
6308+
"source_path": "azure-kubernetes/create-upgrade-delete/error-using-feature-requiring-virtual-machine-scale-set.md",
6309+
"redirect_url": "/troubleshoot/azure/azure-kubernetes/welcome-azure-kubernetes"
62986310
}
62996311
]
63006312
}

support/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Azure Kubernetes Service cluster/node is in a failed state
33
description: Helps troubleshoot an issue where an Azure Kubernetes Service (AKS) cluster/node is in a failed state.
4-
ms.date: 04/01/2024
4+
ms.date: 03/10/2025
55
ms.reviewer: chiragpa, nickoman, v-weizhu, v-six, aritraghosh
66
ms.service: azure-kubernetes-service
77
keywords:
@@ -114,7 +114,7 @@ If you prefer to use Azure CLI to view the activity log for a failed cluster, fo
114114

115115
In the Azure portal, navigate to your AKS cluster resource and select **Diagnose and solve problems** from the left menu. You'll see a list of categories and scenarios that you can select to run diagnostic checks and get recommended solutions.
116116

117-
In the Azure CLI, use the `az aks collect` command with the `--name` and `--resource-group` parameters to collect diagnostic data from your cluster nodes. You can also use the `--storage-account` and `--sas-token` parameters to specify an Azure Storage account where the data will be uploaded. The output will include a link to the **Diagnose and Solve Problems** blade where you can view the results and suggested actions.
117+
In the Azure CLI, use the `az aks kollect` command with the `--name` and `--resource-group` parameters to collect diagnostic data from your cluster nodes. You can also use the `--storage-account` and `--sas-token` parameters to specify an Azure Storage account where the data will be uploaded. The output will include a link to the **Diagnose and Solve Problems** blade where you can view the results and suggested actions.
118118

119119
In the **Diagnose and Solve Problems** blade, you can select **Cluster Issues** as the category. If any issues are detected, you'll see a list of possible solutions that you can follow to fix them.
120120

support/azure/azure-kubernetes/availability-performance/node-not-ready-then-recovers.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
---
22
title: Node not ready but then recovers
33
description: Troubleshoot scenarios in which the status of an AKS cluster node is Node Not Ready, but then the node recovers.
4-
ms.date: 12/09/2024
5-
ms.reviewer: rissing, chiragpa, momajed, v-leedennis
4+
ms.date: 2/25/2024
5+
ms.reviewer: rissing, chiragpa, momajed, v-leedennis, novictor
66
ms.service: azure-kubernetes-service
77
#Customer intent: As an Azure Kubernetes user, I want to prevent the Node Not Ready status for nodes that later recover so that I can avoid future errors within an AKS cluster.
88
ms.custom: sap:Node/node pool availability and performance
99
---
1010
# Troubleshoot Node Not Ready failures that are followed by recoveries
1111

12-
This article provides a guide to troubleshoot and resolve "Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "Not Ready" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions.
12+
This article provides a guide to troubleshoot and resolve Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "NotReady" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions.
1313

1414
## Cause
1515

16-
There are several scenarios that could cause a "Not Ready" state to occur:
16+
There are several scenarios that could cause a "NotReady" state to occur:
1717

1818
- The unavailability of the API server. This causes the readiness probe to fail. This prevents the pod from being attached to the service so that traffic is no longer forwarded to the pod instance.
1919

@@ -24,7 +24,12 @@ There are several scenarios that could cause a "Not Ready" state to occur:
2424

2525
## Resolution
2626

27-
Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
27+
To resolve this issue, follow these steps:
28+
29+
1. Run `kubectl describe node <node-name>` to review detail information about the node's status. Look for any error messages or warnings that might indicate the root cause of the issue.
30+
2. Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
31+
3. Verify the node's network configuration to make sure that there are no connectivity issues.
32+
4. Check the node's resource usage, such as CPU, memory, and disk, to identify potential constraints. For more informations see [Monitor your Kubernetes cluster performance with Container insights](/azure/azure-monitor/containers/container-insights-analyze#view-performance-directly-from-a-cluster)
2833

2934
For further steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).
3035

support/azure/azure-kubernetes/connectivity/cannot-access-cluster-api-server-using-authorized-ip-ranges.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: Can't access the cluster API server using authorized IP ranges
33
description: Troubleshoot problems accessing the cluster API server when you use authorized IP address ranges in Azure Kubernetes Service (AKS).
4-
ms.date: 11/18/2024
5-
ms.reviewer: chiragpa, nickoman, v-leedennis
4+
ms.date: 03/26/2025
5+
ms.reviewer: chiragpa, nickoman, wonkilee, v-leedennis
66
ms.service: azure-kubernetes-service
77
keywords:
88
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot access issues to the cluster API server when I use authorized IP address ranges so that I can work with my Azure Kubernetes Service (AKS) cluster successfully.
@@ -14,7 +14,9 @@ This article discusses how to resolve a scenario in which you can't use authoriz
1414

1515
## Symptoms
1616

17-
If you try to create or manage an AKS cluster, you can't access the cluster API server.
17+
If you try to create or manage resources in an AKS cluster, you can't access the cluster API server. When you run `kubectl`, you receive the following error message:
18+
19+
> Unable to connect to the server: dial tcp x.x.x.x:443: i/o timeout
1820
1921
## Cause
2022

support/azure/azure-kubernetes/connectivity/error-from-server-error-dialing-backend-dial-tcp.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: 'Error from server: error dialing backend: dial tcp'
33
description: 'Troubleshoot the error dialing backend: dial tcp error that blocks you from using kubectl commands or other tools when you connect to the API server.'
4-
ms.date: 10/21/2024
5-
ms.reviewer: chiragpa, nickoman, v-leedennis, pihe
4+
ms.date: 03/05/2025
5+
ms.reviewer: chiragpa, nickoman, v-leedennis, pihe, mariusbutuc
66
ms.service: azure-kubernetes-service
77
keywords:
88
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the "Error from server: error dialing backend: dial tcp" error so that I can connect to the API server or use the `kubectl logs` command to get logs.

support/azure/azure-kubernetes/connectivity/errors-arfter-restricting-egress-traffic.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: Errors after restricting egress traffic
33
description: Troubleshoot errors that occur after you restrict egress traffic from an Azure Kubernetes Service (AKS) cluster.
4-
ms.date: 11/12/2024
5-
ms.reviewer: chiragpa, nickoman, v-leedennis
4+
ms.date: 03/20/2025
5+
ms.reviewer: chiragpa, nickoman, jaewonpark, v-leedennis
66
ms.service: azure-kubernetes-service
77
keywords:
88
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot errors that occur after I restrict egress traffic so that I can access my AKS cluster successfully.
@@ -18,19 +18,21 @@ Certain commands of the [kubectl](https://kubernetes.io/docs/reference/kubectl/)
1818

1919
## Cause
2020

21-
When you restrict egress traffic from an AKS cluster, your settings must comply with [required Outbound network and FQDN rules for AKS clusters](/azure/aks/outbound-rules-control-egress). If your settings are in conflict with any of these rules, the symptoms of egress traffic restriction issues occur.
21+
When you restrict egress traffic from an AKS cluster, your settings must comply with required Outbound network and FQDN (fully qualified domain names) rules for AKS clusters. If your settings are in conflict with any of these rules, the egress traffic restriction issues occur.
2222

2323
## Solution
2424

25-
Verify that your configuration doesn't conflict with any of the [required Outbound network and FQDN rules for AKS clusters](/azure/aks/outbound-rules-control-egress) for the following items:
25+
Verify that your configuration doesn't conflict with any of the [required Outbound network and FQDN (fully qualified domain names) rules for AKS clusters](/azure/aks/outbound-rules-control-egress) for the following items:
2626

2727
- Outbound ports
2828
- Network rules
29-
- Fully qualified domain names (FQDNs)
29+
- FQDNs
3030
- Application rules
3131

32+
Check for conflicts with the rules that might occur in the NSG (network security group), firewall, or appliance that AKS traffic passes through according to the configuration.
33+
3234
> [!NOTE]
33-
> The AKS outbound dependencies are almost entirely defined by using FQDNs. These FQDNs don't have static addresses behind them. The lack of static addresses means that you can't use network security groups (NSGs) to restrict outbound traffic from an AKS cluster.
35+
> The AKS outbound dependencies are almost entirely defined by using FQDNs. These FQDNs don't have static addresses behind them. The lack of static addresses means that you can't use NSGs to restrict outbound traffic from an AKS cluster. Additionally, scenarios that allow only IPs that are obtained from required FQDNs after all deny in NSG are not enough to restrict outbound traffic. Because the IPs are not static, issues might occur later.
3436
3537
## More information
3638

support/azure/azure-kubernetes/connectivity/tunnel-connectivity-issues.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: Tunnel connectivity issues
33
description: Resolve communication issues that are related to tunnel connectivity in an Azure Kubernetes Service (AKS) cluster.
4-
ms.date: 09/26/2024
5-
ms.reviewer: chiragpa, andbar, v-leedennis, v-weizhu
4+
ms.date: 03/23/2025
5+
ms.reviewer: chiragpa, andbar, v-leedennis, v-weizhu, albarqaw
66
ms.service: azure-kubernetes-service
77
keywords: Azure Kubernetes Service, AKS cluster, Kubernetes cluster, tunnels, connectivity, tunnel-front, aks-link
88
#Customer intent: As an Azure Kubernetes user, I want to avoid tunnel connectivity issues so that I can use an Azure Kubernetes Service (AKS) cluster successfully.
@@ -29,6 +29,8 @@ You receive an error message that resembles the following examples about port 10
2929
3030
> Error from server: error dialing backend: dial tcp \<aks-node-ip>:10250: i/o timeout
3131
32+
> Error from server: Get "https\://\<aks-node-name>:10250/containerLogs/\<namespace>/\<pod-name>/\<container-name>": http: server gave HTTP response to HTTPS client
33+
3234
The Kubernetes API server uses port 10250 to connect to a node's kubelet to retrieve the logs. If port 10250 is blocked, the kubectl logs and other features will only work for pods that run on the nodes in which the tunnel component is scheduled. For more information, see [Kubernetes ports and protocols: Worker nodes](https://kubernetes.io/docs/reference/ports-and-protocols/#node).
3335

3436
Because the tunnel components or the connectivity between the server and client can't be established, functionality such as the following won't work as expected:

support/azure/azure-kubernetes/create-upgrade-delete/aks-common-issues-faq.yml

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ metadata:
33
title: Azure Kubernetes Service (AKS) common issues FAQ
44
description: Review a list of frequently asked questions (FAQ) about common issues when you're working with an Azure Kubernetes Service (AKS) cluster.
55
ms.topic: faq
6-
ms.date: 11/14/2023
7-
ms.reviewer: chiragpa, nickoman, v-leedennis
6+
ms.date: 03/06/2025
7+
ms.reviewer: chiragpa, nickoman, jotavar, v-leedennis, v-weizhu
88
ms.service: azure-kubernetes-service
99
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
1010

@@ -26,8 +26,7 @@ sections:
2626
- question: |
2727
Can I move my cluster to a different subscription, or move my subscription with my cluster to a new tenant?
2828
answer: |
29-
If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint.
30-
29+
No. If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint. For more information, see [Operations FAQ](/azure/aks/faq#operations).
3130
- question: |
3231
What naming restrictions are enforced for AKS resources and parameters?
3332
answer: |
@@ -42,7 +41,10 @@ sections:
4241
- AKS node pool names must be all lowercase. The names must be 1-12 characters in length for Linux node pools and 1-6 characters for Windows node pools. A name must start with a letter, and the only allowed characters are letters and numbers.
4342
4443
- The *admin-username*, which sets the administrator user name for Linux nodes, must start with a letter. This user name may only contain letters, numbers, hyphens, and underscores. It has a maximum length of 32 characters.
45-
44+
45+
For more information about naming convention. see the following resources:
46+
- [Naming rules and restrictions for Azure resources](/azure/azure-resource-manager/management/resource-name-rules#microsoftcontainerservice)
47+
- [Abbreviation recommendations for Azure resources](/azure/cloud-adoption-framework/ready/azure-best-practices/resource-abbreviations#containers)
4648
additionalContent: |
4749
[!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)]
4850

support/azure/azure-kubernetes/create-upgrade-delete/aks-increased-memory-usage-cgroup-v2.md

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: Increased memory usage reported in Kubernetes 1.25 or later versions
33
description: Resolve an increase in memory usage that's reported after you upgrade an Azure Kubernetes Service (AKS) cluster to Kubernetes 1.25.x.
4-
ms.date: 07/13/2023
5-
editor: v-jsitser
4+
ms.date: 03/03/2025
5+
editor: momajed
66
ms.reviewer: aritraghosh, cssakscic, v-leedennis
77
ms.service: azure-kubernetes-service
88
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
@@ -23,23 +23,49 @@ You experience one or more of the following symptoms:
2323

2424
## Cause
2525

26-
This increase is caused by a change in memory accounting within version 2 of the Linux control group (cgroup) API. [Cgroup v2](https://kubernetes.io/docs/concepts/architecture/cgroups/) is now the default cgroup version for Kubernetes 1.25 on AKS.
26+
This increase is caused by a change in memory accounting within version 2 of the Linux control group (`cgroup`) API. [Cgroup v2](https://kubernetes.io/docs/concepts/architecture/cgroups/) is now the default `cgroup` version for Kubernetes 1.25 on AKS.
2727

2828
> [!NOTE]
29-
> This issue is distinct from the memory saturation in nodes that's caused by applications or frameworks that aren't aware of cgroup v2. For more information, see [Memory saturation occurs in pods after cluster upgrade to Kubernetes 1.25](./aks-memory-saturation-after-upgrade.md).
29+
> This issue is distinct from the memory saturation in nodes that's caused by applications or frameworks that aren't aware of `cgroup` v2. For more information, see [Memory saturation occurs in pods after cluster upgrade to Kubernetes 1.25](./aks-memory-saturation-after-upgrade.md).
3030
3131
## Solution
3232

3333
- If you observe frequent memory pressure on the nodes, upgrade your subscription to increase the amount of memory that's available to your virtual machines (VMs).
3434

3535
- If you see a higher eviction rate on the pods, [use higher limits and requests for pods](/azure/aks/developer-best-practices-resource-management#define-pod-resource-requests-and-limits).
3636

37+
- `cgroup` v2 uses a different API than `cgroup` v1. If there are any applications that directly access the `cgroup` file system, update them to later versions that support `cgroup` v2. For example:
38+
39+
- **Third-party monitoring and security agents**:
40+
41+
Some monitoring and security agents depend on the `cgroup` file system. Update these agents to versions that support `cgroup` v2.
42+
43+
- **Java applications**:
44+
45+
Use versions that fully support `cgroup` v2:
46+
- OpenJDK/HotSpot: `jdk8u372`, `11.0.16`, `15`, and later versions.
47+
- IBM Semeru Runtimes: `8.0.382.0`, `11.0.20.0`, `17.0.8.0`, and later versions.
48+
- IBM Java: `8.0.8.6` and later versions.
49+
50+
- **uber-go/automaxprocs**:
51+
If you're using the `uber-go/automaxprocs` package, ensure the version is `v1.5.1` or later.
52+
53+
- An alternative temporary solution is to revert the `cgroup` version on your nodes by using the DaemonSet. For more information, see [Revert to cgroup v1 DaemonSet](https://github.com/Azure/AKS/blob/master/examples/cgroups/revert-cgroup-v1.yaml).
54+
55+
> [!IMPORTANT]
56+
> - Use the DaemonSet cautiously. Test it in a lower environment before applying to production to ensure compatibility and prevent disruptions.
57+
> - By default, the DaemonSet applies to all nodes in the cluster and reboots them to implement the `cgroup` change.
58+
> - To control how the DaemonSet is applied, configure a `nodeSelector` to target specific nodes.
59+
60+
3761
> [!NOTE]
3862
> If you experience only an increase in memory use without any of the other symptoms that are mentioned in the "Symptoms" section, you don't have to take any action.
3963
4064
## Status
4165

42-
We're actively working with the Kubernetes community to fix the underlying issue, and we'll keep you updated on our progress. We also plan to change the eviction thresholds or [resource reservations](/azure/aks/concepts-clusters-workloads#resource-reservations), depending on the outcome of the fix.
66+
We're actively working with the Kubernetes community to resolve the underlying issue. Progress on this effort can be tracked at [Azure/AKS Issue #3443](https://github.com/kubernetes/kubernetes/issues/118916).
67+
68+
As part of the resolution, we plan to adjust the eviction thresholds or update [resource reservations](/azure/aks/concepts-clusters-workloads#resource-reservations), depending on the outcome of the fix.
4369

4470
## Reference
4571

0 commit comments

Comments
 (0)