MicrosoftDocs
diff --git a/‎.openpublishing.redirection.json‎
Lines changed: 4 additions & 0 deletions b/‎.openpublishing.redirection.json‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎support/azure/azure-kubernetes/connectivity/tunnel-connectivity-issues.md‎
Lines changed: 76 additions & 2 deletions b/‎support/azure/azure-kubernetes/connectivity/tunnel-connectivity-issues.md‎
Lines changed: 76 additions & 2 deletions
diff --git a/‎support/azure/azure-kubernetes/create-upgrade-delete/error-code-operationnotallowed-publicipcountlimitreached.md‎
Lines changed: 2 additions & 2 deletions b/‎support/azure/azure-kubernetes/create-upgrade-delete/error-code-operationnotallowed-publicipcountlimitreached.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎support/azure/azure-kubernetes/create-upgrade-delete/error-code-requestdisallowedbypolicy.md‎
Lines changed: 18 additions & 13 deletions b/‎support/azure/azure-kubernetes/create-upgrade-delete/error-code-requestdisallowedbypolicy.md‎
Lines changed: 18 additions & 13 deletions
diff --git a/‎support/azure/azure-kubernetes/create-upgrade-delete/error-code-toomanyrequestsreceived-subscriptionrequeststhrottled.md‎
Lines changed: 2 additions & 2 deletions b/‎support/azure/azure-kubernetes/create-upgrade-delete/error-code-toomanyrequestsreceived-subscriptionrequeststhrottled.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎support/azure/azure-kubernetes/create-upgrade-delete/pod-stuck-crashloopbackoff-mode.md‎
Lines changed: 18 additions & 4 deletions b/‎support/azure/azure-kubernetes/create-upgrade-delete/pod-stuck-crashloopbackoff-mode.md‎
Lines changed: 18 additions & 4 deletions
diff --git a/‎support/azure/azure-kubernetes/create-upgrade-delete/troubleshoot-common-azure-linux-aks.md‎
Lines changed: 4 additions & 2 deletions b/‎support/azure/azure-kubernetes/create-upgrade-delete/troubleshoot-common-azure-linux-aks.md‎
Lines changed: 4 additions & 2 deletions
@@ -13267,6 +13267,10 @@
     {
       "source_path": "support/dynamics-365/sales/the-record-could-not-be-deleted.md",
       "redirect_url": "/troubleshoot/power-platform/dataverse/working-with-solutions/the-record-could-not-be-deleted"
+    },
+    {
+      "source_path": "support/power-platform/power-automate/dataverse-cds/cds-user-cannot-access-power-automate-business-process-flows-on-demand-workflows.md",
+      "redirect_url": "/previous-versions/troubleshoot/power-platform/power-automate/cloud-flows/cds-user-cannot-access-power-automate-business-process-flows-on-demand-workflows"
     }
   ]
 }
@@ -1,10 +1,10 @@
 ---
-title: Tunnel connectivity issues
+title: Tunnel Connectivity Issues
 description: Resolve communication issues that are related to tunnel connectivity in an Azure Kubernetes Service (AKS) cluster.
 ms.date: 03/23/2025
 ms.reviewer: chiragpa, andbar, v-leedennis, v-weizhu, albarqaw
 ms.service: azure-kubernetes-service
-keywords: Azure Kubernetes Service, AKS cluster, Kubernetes cluster, tunnels, connectivity, tunnel-front, aks-link
+keywords: Azure Kubernetes Service, AKS cluster, Kubernetes cluster, tunnels, connectivity, tunnel-front, aks-link, Konnectivity agent, Cluster Proportional Autoscaler, CPA, Resource allocation, Performance bottlenecks, Networking reliability, Azure Kubernetes troubleshooting, AKS performance issues
 #Customer intent: As an Azure Kubernetes user, I want to avoid tunnel connectivity issues so that I can use an Azure Kubernetes Service (AKS) cluster successfully.
 ms.custom: sap:Connectivity
 ---
@@ -251,6 +251,80 @@ If everything is OK within the application, you'll have to adjust the allocated
 
 You can set up a new cluster to use a Managed Network Address Translation (NAT) Gateway for outbound connections. For more information, see [Create an AKS cluster with a Managed NAT Gateway](/azure/aks/nat-gateway#create-an-aks-cluster-with-a-managed-nat-gateway).
 
+## Cause 6: Konnectivity Agents performance issues with Cluster growth
+
+As the cluster grows, the performance of Konnectivity Agents might degrade because of increased network traffic, more requests, or resource constraints.
+
+> [!NOTE]
+> This cause applies to only the `Konnectivity-agent` pods.
+
+### Solution 6: Cluster Proportional Autoscaler for Konnectivity Agent
+
+ To manage scalability challenges in large clusters, we implement the Cluster Proportional Autoscaler for our Konnectivity Agents. This approach aligns with industry standards and best practices. It ensures optimal resource usage and enhanced performance.
+
+**Why this change was made**
+Previously, the Konnectivity agent had a fixed replica count that could create a bottleneck as the cluster grew. By implementating the Cluster Proportional Autoscaler, we enable the replica count to adjust dynamically, based on node-scaling rules, to provide optimal performance and resource usage.
+
+**How the Cluster Proportional Autoscaler works**
+The Cluster Proportional Autoscaler work uses a ladder configuration to determine the number of Konnectivity agent replicas based on the cluster size. The ladder configuration is defined in the konnectivity-agent-autoscaler configmap in the kube-system namespace. Here is an example of the ladder configuration:
+
+```
+nodesToReplicas": [
+    [1, 2],
+    [100, 3],
+    [250, 4],
+    [500, 5],
+    [1000, 6],
+    [5000, 10]
+]
+```
+
+This configuration makes sure that the number of replicas scales appropriately with the number of nodes in the cluster to provide optimal resource allocation and improved networking reliability.
+
+**How to use the Cluster Proportional Autoscaler?**
+You can override default values by updating the konnectivity-agent-autoscaler configmap in the kube-system namespace. Here is a sample command to update the configmap:
+
+```bash
+kubectl edit configmap <pod-name> -n kube-system
+```
+This command opens the configmap in an editor to enable you to make the necessary changes.
+
+**What you should check** 
+
+You have to monitor for Out Of Memory (OOM) kills on the nodes because misconfiguration of the Cluster Proportional Autoscaler can cause insufficient memory allocation for the Konnectivity agents. This misconfiguration occurs for the following key reasons:
+
+**High Memory Usage:** As the cluster grows, the memory usage of Konnectivity agents can increase significantly. This increase can occur especially during peak loads or when handling large numbers of connections. If the Cluster Proportional Autoscaler configuration does not scale the replicas appropriately, the agents may run out of memory.
+
+**Fixed Resource Limits:** If the resource requests and limits for the Konnectivity agents are set too low, they might not have enough memory to handle the workload, leading to OOM kills. Misconfigured Cluster Proportional Autoscaler settings can exacerbate this issue by not providing enough replicas to distribute the load.
+
+**Cluster Size and Workload Variability:** The CPU and memory that are needed by the Konnectivity agents can vary widely depending on the size of the cluster and the workload. If the Cluster Proportional Autoscaler ladder configuration is not right-sized and adaptively resized for the cluster's usage patterns, it can cause memory overcommitment and OOM kills.
+
+To identify and troubleshoot OOM kills, follow these steps:
+
+1. Check for OOM Kills on nodes: Use the following command to check for OOM Kills on your nodes:
+
+```
+kubectl get events --all-namespaces | grep -i 'oomkill'
+```
+
+2. Inspect Node Resource Usage: Verify the resource usage on your nodes to make sure that they aren't running out of memory:
+
+```
+kubectl top nodes
+```
+
+3.  Review Pod Resource Requests and Limits: Make sure that the Konnectivity agent pods have appropriate resource requests and limits set to prevent OOM Kills:
+
+```
+kubectl get pod <pod-name> -n kube-system -o yaml | grep -A5 "resources:"
+```
+
+4.  Adjust Resource Requests and Limits: If necessary, adjust the resource requests and limits for the Konnectivity agent pods by editing the deployment:
+
+```
+kubectl edit deployment konnectivity-agent -n kube-system
+```
+
 [!INCLUDE [Third-party contact disclaimer](../../../includes/third-party-contact-disclaimer.md)]
 
 [!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
@@ -1,9 +1,9 @@
 ---
 title: Troubleshoot OperationNotAllowed or PublicIPCountLimitReached
 description: Learn how to troubleshoot the OperationNotAllowed or PublicIPCountLimitReached quota error when you try to create and deploy an Azure Kubernetes Service (AKS) cluster.
-ms.date: 10/28/2024
+ms.date: 04/03/2024
 editor: v-jsitser
-ms.reviewer: rissing, chiragpa, erbookbi, v-leedennis
+ms.reviewer: rissing, chiragpa, erbookbi, v-leedennis, dorinalecu
 ms.service: azure-kubernetes-service
 #Customer intent: As an Azure Kubernetes user, I want to troubleshoot the OperationNotAllowed or PublicIPCountLimitReached quota error code so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
 ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
 
@@ -1,11 +1,11 @@
 ---
-title: RequestDisallowedByPolicy error when deploying an AKS cluster
+title: RequestDisallowedByPolicy Error When Deploying an AKS Cluster
 description: Learn how to fix the RequestDisallowedByPolicy error when you try to create and deploy an Azure Kubernetes Service (AKS) cluster.
-ms.date: 10/12/2024
+ms.date: 03/13/2025
 editor: v-jsitser
-ms.reviewer: rissing, chiragpa, erbookbi, albarqaw, v-leedennis, v-weizhu
+ms.reviewer: rissing, chiragpa, erbookbi, albarqaw, jacobbaek, v-leedennis, v-weizhu
 ms.service: azure-kubernetes-service
-#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the RequestDisallowedByPolicy error code so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
+#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the RequestDisallowedByPolicy error so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
 ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
 ---
 # RequestDisallowedByPolicy error when deploying an AKS cluster
@@ -22,24 +22,29 @@ When you try to deploy an AKS cluster, you receive the following error message:
 
 ## Cause
 
-For security or compliance, your subscription administrators might assign policies that limit how resources are deployed. For example, your subscription might have a policy that prevents creating public IP addresses, network security groups, user-defined routes, or route tables. The error message includes the specific reason why the cluster creation was blocked. Only you can manage the policies in your environment. Microsoft can't disable or bypass those policies.
+For security or compliance, your subscription administrators might assign policies that limit how resources are deployed. For example, your subscription might have a policy that prevents you from creating public IP addresses, network security groups, user-defined routes, or route tables. The error message includes the specific reason why the cluster creation was blocked. 
+
+> [!NOTE]
+> Only you can manage the policies in your environment. Microsoft can't disable or bypass those policies.
 
 ## Solution
 
 To fix this issue, follow these steps:
 
-1. Find the policy that blocks the action. These policies are listed in the error message. The name of a policy assignment or definition is the last segment of the `id` string shown in the error message.
-
-1. If possible, change your deployment to meet the limitations of the policy, and then retry the deploy operation.
-
-1. Add an [exception to the policy](/azure/governance/policy/concepts/exemption-structure).
+1. Find the policy that blocks the action. These policies are listed in the error message.   
+   The name of a policy assignment or definition is the last segment of the `id` string that's shown in the error message.
+   ```
+   # Example
+   Code: RequestDisallowedByPolicy
+   Message: Resource 'resourcegroup' was disallowed by policy. Policy identifiers: '[{"policyAssignment":{"name":"Not allowed resource types","id":"/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/policyAssignments/00000000000000000000000"},"policyDefinition":{"name":"Not allowed resource types","id":"/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/policyDefinitions/not-allowed-resourcetypes","version":"1.0.0"}}]'.
+   ```
 
-1. [Disable the policy](/azure/defender-for-cloud/tutorial-security-policy#disable-security-policies-and-disable-recommendations).
+1. If possible, update your deployment to comply with the policy restrictions, and then retry the deployment. Alternatively, if you have permission to update policy, [add an exemption](/azure/governance/policy/tutorials/disallowed-resources#create-an-exemption) to the policy.
 
-To get details about the policy that blocked your cluster deployment operation, see [RequestDisallowedByPolicy error with Azure resource policy](/azure/azure-resource-manager/troubleshooting/error-policy-requestdisallowedbypolicy).
+To get details about the policy that blocked your cluster deployment, see [RequestDisallowedByPolicy error with Azure resource policy](/azure/azure-resource-manager/troubleshooting/error-policy-requestdisallowedbypolicy).
 
 > [!NOTE]
-> After fixing the policy that blocks the AKS cluster creation, run the `az aks update -g MyResourceGroup -n MyManagedCluster` command to change the cluster from a failed to a success state. This will reconcile the cluster and retry the last failed operation. For more information about clusters in a failed state, see [Troubleshoot Azure Kubernetes Service clusters or nodes in a failed state](../availability-performance/cluster-node-virtual-machine-failed-state.md).
+> After you fix the policy that blocks the AKS cluster creation, run the `az aks update -g MyResourceGroup -n MyManagedCluster` command to change the cluster from a failed state to a successful state. This change reconciles the cluster and retries the last failed operation. For more information about clusters in a failed state, see [Troubleshoot Azure Kubernetes Service clusters or nodes in a failed state](../availability-performance/cluster-node-virtual-machine-failed-state.md).
 
 ## More information
 
 
@@ -1,9 +1,9 @@
 ---
 title: Troubleshoot the TooManyRequestsReceived or SubscriptionRequestsThrottled error code
 description: Learn how to troubleshoot the TooManyRequestsReceived or SubscriptionRequestsThrottled error when you try to delete an Azure Kubernetes Service (AKS) cluster.
-ms.date: 11/18/2024
+ms.date: 04/03/2025
 editor: v-jsitser
-ms.reviewer: rissing, chiragpa, edneto, v-leedennis
+ms.reviewer: rissing, chiragpa, edneto, v-leedennis, dorinalecu
 ms.service: azure-kubernetes-service
 #Customer intent: As an Azure Kubernetes user, I want to troubleshoot the TooManyRequestsReceived or SubscriptionRequestsThrottled error code so that I can successfully delete an Azure Kubernetes Service (AKS) cluster.
 ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
 
@@ -1,17 +1,31 @@
 ---
 title: Pod is stuck in CrashLoopBackOff mode
 description: Troubleshoot a scenario in which a pod is stuck in CrashLoopBackOff mode on an Azure Kubernetes Service (AKS) cluster.
-ms.date: 09/07/2023
+ms.date: 04/07/2025
 author: VikasPullagura-MSFT
 ms.author: vipullag
-editor: v-jsitser
-ms.reviewer: chiragpa, nickoman, cssakscic, v-leedennis
+editor: v-jsitser, addobres
+ms.reviewer: chiragpa, nickoman, cssakscic, v-leedennis, addobres
 ms.service: azure-kubernetes-service
 ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
 ---
 # Pod is stuck in CrashLoopBackOff mode
 
-If a pod has a `CrashLoopBackOff` status, then the pod probably failed or exited unexpectedly, and the log contains an exit code that isn't zero. There are several possible reasons why your pod is stuck in `CrashLoopBackOff` mode. Consider the following options and their associated [kubectl](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands) commands.
+If a pod has a `CrashLoopBackOff` status, then the pod probably failed or exited unexpectedly, and the log contains an exit code that isn't zero. Here are several possible reasons why your pod is stuck in `CrashLoopBackOff` mode:
+
+1. **Application failure**: The application inside the container crashes shortly after starting, often due to misconfigurations, missing dependencies, or incorrect environment variables.
+2. **Incorrect resource limits**: If the pod exceeds its CPU or memory resource limits, Kubernetes might kill the container. This issue can happen if resource requests or limits are set too low.
+3. **Missing or misconfigured ConfigMaps/Secrets**: If the application relies on configuration files or environment variables stored in ConfigMaps or Secrets but they're missing or misconfigured, the application might crash.
+4. **Image pull issues**: If there's an issue with the image (for example, it's corrupted or has an incorrect tag), the container might not start properly and fail repeatedly.
+5. **Init containers failing**: If the pod has init containers and one or more fail to run properly, the pod will restart.
+6. **Liveness/Readiness probe failures**: If liveness or readiness probes are misconfigured, Kubernetes might detect the container as unhealthy and restart it.
+7. **Application dependencies not ready**: The application might depend on services that aren't yet ready, such as databases, message queues, or other APIs.
+8. **Networking issues**: Network misconfigurations can prevent the application from communicating with necessary services, causing it to fail.
+9. **Invalid commands or arguments**: The container might be started with an invalid `ENTRYPOINT`, command, or argument, leading to a crash.
+
+For more information about the container status, see [Pod Lifecycle - Container states](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-states).
+
+Consider the following options and their associated [kubectl](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands) commands.
 
 | Option | kubectl command |
 |--|--|
 
@@ -1,11 +1,11 @@
 ---
 title: Troubleshoot common issues for Azure Linux Container Host for AKS
 description: Troubleshoot commonly reported issues for Azure Linux container hosts on Azure Kubernetes Service (AKS). 
-ms.date: 09/08/2023
+ms.date: 04/02/2025
 author: suhuruli
 ms.author: suhuruli
 editor: v-jsitser
-ms.reviewer: v-leedennis
+ms.reviewer: mnasser, v-weizhu
 ms.service: azure-kubernetes-service
 ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool), linux-related-content
 ---
@@ -77,6 +77,8 @@ Most commands in the Azure Linux OS, such as the process status (`ps`) command,
 | `apt-mark auto`           | `tdnf install dnf mark remove`                                               |
 | `apt-mark manual`         | `dnf mark install`                                                           |
 | `apt-mark showmanual`     | `dnf history userinstalled`                                                  |
+| `add-apt-repository`      | Edit `/etc/yum.repos.d/*.repo` files                                         |
+| `apt-key add`             | `rpm --import`                                                               |
 
 ### Step 2: Check the Azure Linux version
Original file line number	Diff line number	Diff line change
`@@ -13267,6 +13267,10 @@`
`13267`	`13267`	`{`
`13268`	`13268`	`"source_path": "support/dynamics-365/sales/the-record-could-not-be-deleted.md",`
`13269`	`13269`	`"redirect_url": "/troubleshoot/power-platform/dataverse/working-with-solutions/the-record-could-not-be-deleted"`
	`13270`	`+ },`
	`13271`	`+ {`
	`13272`	`+ "source_path": "support/power-platform/power-automate/dataverse-cds/cds-user-cannot-access-power-automate-business-process-flows-on-demand-workflows.md",`
	`13273`	`+ "redirect_url": "/previous-versions/troubleshoot/power-platform/power-automate/cloud-flows/cds-user-cannot-access-power-automate-business-process-flows-on-demand-workflows"`
`13270`	`13274`	`}`
`13271`	`13275`	`]`
`13272`	`13276`	`}`