Merge pull request #8676 from MicrosoftDocs/repo_sync_working_branch

Simonx Xu · web-flow · commit d9fc88764067 · 2025-04-10T08:31:53.000+08:00
Confirm merge from repo_sync_working_branch to main to sync with https://github.com/MicrosoftDocs/SupportArticles-docs (branch main)
diff --git a/support/azure/azure-kubernetes/connectivity/tunnel-connectivity-issues.md b/support/azure/azure-kubernetes/connectivity/tunnel-connectivity-issues.md
@@ -1,10 +1,10 @@
 ---
-title: Tunnel connectivity issues
+title: Tunnel Connectivity Issues
 description: Resolve communication issues that are related to tunnel connectivity in an Azure Kubernetes Service (AKS) cluster.
 ms.date: 03/23/2025
 ms.reviewer: chiragpa, andbar, v-leedennis, v-weizhu, albarqaw
 ms.service: azure-kubernetes-service
-keywords: Azure Kubernetes Service, AKS cluster, Kubernetes cluster, tunnels, connectivity, tunnel-front, aks-link
+keywords: Azure Kubernetes Service, AKS cluster, Kubernetes cluster, tunnels, connectivity, tunnel-front, aks-link, Konnectivity agent, Cluster Proportional Autoscaler, CPA, Resource allocation, Performance bottlenecks, Networking reliability, Azure Kubernetes troubleshooting, AKS performance issues
 #Customer intent: As an Azure Kubernetes user, I want to avoid tunnel connectivity issues so that I can use an Azure Kubernetes Service (AKS) cluster successfully.
 ms.custom: sap:Connectivity
 ---
@@ -251,6 +251,80 @@ If everything is OK within the application, you'll have to adjust the allocated
 
 You can set up a new cluster to use a Managed Network Address Translation (NAT) Gateway for outbound connections. For more information, see [Create an AKS cluster with a Managed NAT Gateway](/azure/aks/nat-gateway#create-an-aks-cluster-with-a-managed-nat-gateway).
 
+## Cause 6: Konnectivity Agents performance issues with Cluster growth
+
+As the cluster grows, the performance of Konnectivity Agents might degrade because of increased network traffic, more requests, or resource constraints.
+
+> [!NOTE]
+> This cause applies to only the `Konnectivity-agent` pods.
+
+### Solution 6: Cluster Proportional Autoscaler for Konnectivity Agent
+
+ To manage scalability challenges in large clusters, we implement the Cluster Proportional Autoscaler for our Konnectivity Agents. This approach aligns with industry standards and best practices. It ensures optimal resource usage and enhanced performance.
+
+**Why this change was made**
+Previously, the Konnectivity agent had a fixed replica count that could create a bottleneck as the cluster grew. By implementating the Cluster Proportional Autoscaler, we enable the replica count to adjust dynamically, based on node-scaling rules, to provide optimal performance and resource usage.
+
+**How the Cluster Proportional Autoscaler works**
+The Cluster Proportional Autoscaler work uses a ladder configuration to determine the number of Konnectivity agent replicas based on the cluster size. The ladder configuration is defined in the konnectivity-agent-autoscaler configmap in the kube-system namespace. Here is an example of the ladder configuration:
+
+```
+nodesToReplicas": [
+    [1, 2],
+    [100, 3],
+    [250, 4],
+    [500, 5],
+    [1000, 6],
+    [5000, 10]
+]
+```
+
+This configuration makes sure that the number of replicas scales appropriately with the number of nodes in the cluster to provide optimal resource allocation and improved networking reliability.
+
+**How to use the Cluster Proportional Autoscaler?**
+You can override default values by updating the konnectivity-agent-autoscaler configmap in the kube-system namespace. Here is a sample command to update the configmap:
+
+```bash
+kubectl edit configmap <pod-name> -n kube-system
+```
+This command opens the configmap in an editor to enable you to make the necessary changes.
+
+**What you should check** 
+
+You have to monitor for Out Of Memory (OOM) kills on the nodes because misconfiguration of the Cluster Proportional Autoscaler can cause insufficient memory allocation for the Konnectivity agents. This misconfiguration occurs for the following key reasons:
+
+**High Memory Usage:** As the cluster grows, the memory usage of Konnectivity agents can increase significantly. This increase can occur especially during peak loads or when handling large numbers of connections. If the Cluster Proportional Autoscaler configuration does not scale the replicas appropriately, the agents may run out of memory.
+
+**Fixed Resource Limits:** If the resource requests and limits for the Konnectivity agents are set too low, they might not have enough memory to handle the workload, leading to OOM kills. Misconfigured Cluster Proportional Autoscaler settings can exacerbate this issue by not providing enough replicas to distribute the load.
+
+**Cluster Size and Workload Variability:** The CPU and memory that are needed by the Konnectivity agents can vary widely depending on the size of the cluster and the workload. If the Cluster Proportional Autoscaler ladder configuration is not right-sized and adaptively resized for the cluster's usage patterns, it can cause memory overcommitment and OOM kills.
+
+To identify and troubleshoot OOM kills, follow these steps:
+
+1. Check for OOM Kills on nodes: Use the following command to check for OOM Kills on your nodes:
+
+```
+kubectl get events --all-namespaces | grep -i 'oomkill'
+```
+
+2. Inspect Node Resource Usage: Verify the resource usage on your nodes to make sure that they aren't running out of memory:
+
+```
+kubectl top nodes
+```
+
+3.  Review Pod Resource Requests and Limits: Make sure that the Konnectivity agent pods have appropriate resource requests and limits set to prevent OOM Kills:
+
+```
+kubectl get pod <pod-name> -n kube-system -o yaml | grep -A5 "resources:"
+```
+
+4.  Adjust Resource Requests and Limits: If necessary, adjust the resource requests and limits for the Konnectivity agent pods by editing the deployment:
+
+```
+kubectl edit deployment konnectivity-agent -n kube-system
+```
+
 [!INCLUDE [Third-party contact disclaimer](../../../includes/third-party-contact-disclaimer.md)]
 
 [!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
diff --git a/support/azure/azure-kubernetes/create-upgrade-delete/error-code-requestdisallowedbypolicy.md b/support/azure/azure-kubernetes/create-upgrade-delete/error-code-requestdisallowedbypolicy.md
@@ -1,11 +1,11 @@
 ---
-title: RequestDisallowedByPolicy error when deploying an AKS cluster
+title: RequestDisallowedByPolicy Error When Deploying an AKS Cluster
 description: Learn how to fix the RequestDisallowedByPolicy error when you try to create and deploy an Azure Kubernetes Service (AKS) cluster.
-ms.date: 10/12/2024
+ms.date: 03/13/2025
 editor: v-jsitser
-ms.reviewer: rissing, chiragpa, erbookbi, albarqaw, v-leedennis, v-weizhu
+ms.reviewer: rissing, chiragpa, erbookbi, albarqaw, jacobbaek, v-leedennis, v-weizhu
 ms.service: azure-kubernetes-service
-#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the RequestDisallowedByPolicy error code so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
+#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the RequestDisallowedByPolicy error so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
 ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
 ---
 # RequestDisallowedByPolicy error when deploying an AKS cluster
@@ -22,24 +22,29 @@ When you try to deploy an AKS cluster, you receive the following error message:
 
 ## Cause
 
-For security or compliance, your subscription administrators might assign policies that limit how resources are deployed. For example, your subscription might have a policy that prevents creating public IP addresses, network security groups, user-defined routes, or route tables. The error message includes the specific reason why the cluster creation was blocked. Only you can manage the policies in your environment. Microsoft can't disable or bypass those policies.
+For security or compliance, your subscription administrators might assign policies that limit how resources are deployed. For example, your subscription might have a policy that prevents you from creating public IP addresses, network security groups, user-defined routes, or route tables. The error message includes the specific reason why the cluster creation was blocked. 
+
+> [!NOTE]
+> Only you can manage the policies in your environment. Microsoft can't disable or bypass those policies.
 
 ## Solution
 
 To fix this issue, follow these steps:
 
-1. Find the policy that blocks the action. These policies are listed in the error message. The name of a policy assignment or definition is the last segment of the `id` string shown in the error message.
-
-1. If possible, change your deployment to meet the limitations of the policy, and then retry the deploy operation.
-
-1. Add an [exception to the policy](/azure/governance/policy/concepts/exemption-structure).
+1. Find the policy that blocks the action. These policies are listed in the error message.   
+   The name of a policy assignment or definition is the last segment of the `id` string that's shown in the error message.
+   ```
+   # Example
+   Code: RequestDisallowedByPolicy
+   Message: Resource 'resourcegroup' was disallowed by policy. Policy identifiers: '[{"policyAssignment":{"name":"Not allowed resource types","id":"/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/policyAssignments/00000000000000000000000"},"policyDefinition":{"name":"Not allowed resource types","id":"/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/policyDefinitions/not-allowed-resourcetypes","version":"1.0.0"}}]'.
+   ```
 
-1. [Disable the policy](/azure/defender-for-cloud/tutorial-security-policy#disable-security-policies-and-disable-recommendations).
+1. If possible, update your deployment to comply with the policy restrictions, and then retry the deployment. Alternatively, if you have permission to update policy, [add an exemption](/azure/governance/policy/tutorials/disallowed-resources#create-an-exemption) to the policy.
 
-To get details about the policy that blocked your cluster deployment operation, see [RequestDisallowedByPolicy error with Azure resource policy](/azure/azure-resource-manager/troubleshooting/error-policy-requestdisallowedbypolicy).
+To get details about the policy that blocked your cluster deployment, see [RequestDisallowedByPolicy error with Azure resource policy](/azure/azure-resource-manager/troubleshooting/error-policy-requestdisallowedbypolicy).
 
 > [!NOTE]
-> After fixing the policy that blocks the AKS cluster creation, run the `az aks update -g MyResourceGroup -n MyManagedCluster` command to change the cluster from a failed to a success state. This will reconcile the cluster and retry the last failed operation. For more information about clusters in a failed state, see [Troubleshoot Azure Kubernetes Service clusters or nodes in a failed state](../availability-performance/cluster-node-virtual-machine-failed-state.md).
+> After you fix the policy that blocks the AKS cluster creation, run the `az aks update -g MyResourceGroup -n MyManagedCluster` command to change the cluster from a failed state to a successful state. This change reconciles the cluster and retries the last failed operation. For more information about clusters in a failed state, see [Troubleshoot Azure Kubernetes Service clusters or nodes in a failed state](../availability-performance/cluster-node-virtual-machine-failed-state.md).
 
 ## More information
 
diff --git a/support/azure/azure-storage/files/performance/files-troubleshoot-performance.md b/support/azure/azure-storage/files/performance/files-troubleshoot-performance.md
@@ -239,17 +239,34 @@ It's possible you're experiencing throttling, and your requests are being sent t
 
 Ensure your app is within the [Azure Files scale targets](/azure/storage/files/storage-files-scale-targets#azure-files-scale-targets). If you're using standard Azure file shares, consider switching to premium.
 
+### Cause 3: Azure File Share reaches capacity
+
+If the Azure file share is close to reaching its capacity, an important step is to identify the largest files and directories in order to optimize storage. This step helps you to understand which files and folders are using the most space.
+
+### Workaround
+
+To get a comprehensive view of storage usage across the entire share, mount the root of the share. This action enables a thorough inspection of file and directory sizes. At the root of the file share, run the following commands to identify the largest files and directories:
+
+```bash
+	cd /path/to/mount/point
+	du -ah --max-depth=1 | sort -rh | head -n 20
+```
+
+This command displays the 20 largest files and directories in descending order of size. This display provides a clear overview of storage consumption.
+ 
+If you can't mount the root of the share, use Azure Storage Explorer or a third-party tool to analyze storage usage. These tools provide similar insights into file and directory sizes without requiring you to mount the share.
+
 ## Throughput on Linux clients is lower than that of Windows clients
 
 ### Cause
 
-This is a known issue with implementing the SMB client on Linux.
+This is a known issue that affects implementing the SMB client on Linux.
 
 ### Workaround
 
 - Spread the load across multiple VMs.
-- On the same VM, use multiple mount points with a `nosharesock` option, and spread the load across these mount points.
-- On Linux, try mounting with a `nostrictsync` option to avoid forcing an SMB flush on every `fsync` call. For Azure Files, this option doesn't interfere with data consistency, but it might result in stale file metadata on directory listings (`ls -l` command). Directly querying file metadata by using the `stat` command will return the most up-to-date file metadata.
+- On the same VM, use multiple mount points that have a `nosharesock` option, and spread the load across these mount points.
+- On Linux, try mounting by using a `nostrictsync` option to avoid forcing an SMB flush on every `fsync` call. For Azure Files, this option doesn't interfere with data consistency, but it might cause stale file metadata on directory listings (`ls -l` command). Directly querying file metadata by using the `stat` command returns the most up-to-date file metadata.
 
 ## High latencies for metadata-heavy workloads involving extensive open/close operations
 
diff --git a/support/windows-server/active-directory/transfer-or-seize-operation-master-roles-in-ad-ds.md b/support/windows-server/active-directory/transfer-or-seize-operation-master-roles-in-ad-ds.md
@@ -1,7 +1,7 @@
 ---
 title: Transfer or seize Operation Master roles
 description: Describes how you can use the Ntdsutil.exe utility to move or to seize Operation Master roles, formerly known as Flexible Single Master Operations (FSMO) roles.
-ms.date: 01/15/2025
+ms.date: 04/07/2025
 manager: dcscontentpm
 audience: ITPro
 ms.topic: troubleshooting
@@ -35,7 +35,9 @@ For more information about the Operation Master role holders and recommendations
 
 When a DC that has been acting as a role holder starts to run (for example, after a failure or a shutdown), it doesn't immediately resume behaving as the role holder. The DC waits until it receives inbound replication for its naming context (for example, the Schema master role owner waits to receive inbound replication of the Schema partition).
 
-The information that the DCs pass as part of Active Directory replication includes the identities of the current Operation Master role holders. When the newly started DC receives the inbound replication information, it verifies whether it's still the role holder. If it is, it resumes typical operations. If the replicated information indicates that another DC is acting as the role holder, the newly started DC relinquishes its role ownership. This behavior reduces the chance that the domain or forest will have duplicate Operation Master role holders.
+The information that the DCs pass as part of Active Directory replication includes the identities of the current Operation Master role holders. When the newly started DC receives the inbound replication information, it verifies whether it's still the role holder. The Active Directory Replication Engine resolves any potentially conflicting changes. For more information, see [Resolving conflicting changes](/previous-versions/windows/it-pro/windows-server-2003/cc736978(v=ws.10)#resolving-conflicting-changes).
+
+If the DC is the current Operations Master, it resumes typical operations. If the replicated information indicates that another DC is acting as the role holder, the newly started DC relinquishes its role ownership. This behavior reduces the chance that the domain or forest have duplicate Operation Master role holders.
 
 > [!IMPORTANT]
 > AD FS operations fail if they require a role holder and if the newly started role holder is, in fact, the role holder and it doesn't receive inbound replication.  
@@ -199,8 +201,8 @@ If it's possible, and if you're able to transfer the roles instead of seizing th
 When part of a domain or forest can't communicate with the rest of the domain or forest for an extended time, the isolated sections of domain or forest are known as replication islands. DCs in one island can't replicate with the DCs in other islands. Over multiple replication cycles, the replication islands fall out of sync. If each island has its own Operation Master role holders, you may have problems when you restore communication between the islands.
 
 > [!IMPORTANT]
-> In most cases, you can take advantage of the initial replication requirement (as described in this article) to weed out duplicate role holders. A restarted role holder should relinquish the role if it detects a duplicate role-holder.  
-> You may encounter circumstances that this behavior does not resolve. In such cases, the information in this section may be helpful.
+> In most cases, you can take advantage of the initial replication requirement (as described in this article) to weed out duplicate role holders. A restarted role holder will relinquish the role if it detects a duplicate role-holder through updates it receives on inbound replication.  
+> You may encounter circumstances that this behavior does not resolve the Operations Master conflict. In such cases, the information in this section may be helpful.
 
 The following table identifies the Operation Master roles that can cause problems if a forest or domain has multiple role-holders for that role: