Skip to content

Commit d9fc887

Browse files
author
Simonx Xu
authored
Merge pull request #8676 from MicrosoftDocs/repo_sync_working_branch
Confirm merge from repo_sync_working_branch to main to sync with https://github.com/MicrosoftDocs/SupportArticles-docs (branch main)
2 parents bc07d5a + aeb9039 commit d9fc887

File tree

4 files changed

+120
-22
lines changed

4 files changed

+120
-22
lines changed

support/azure/azure-kubernetes/connectivity/tunnel-connectivity-issues.md

Lines changed: 76 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
2-
title: Tunnel connectivity issues
2+
title: Tunnel Connectivity Issues
33
description: Resolve communication issues that are related to tunnel connectivity in an Azure Kubernetes Service (AKS) cluster.
44
ms.date: 03/23/2025
55
ms.reviewer: chiragpa, andbar, v-leedennis, v-weizhu, albarqaw
66
ms.service: azure-kubernetes-service
7-
keywords: Azure Kubernetes Service, AKS cluster, Kubernetes cluster, tunnels, connectivity, tunnel-front, aks-link
7+
keywords: Azure Kubernetes Service, AKS cluster, Kubernetes cluster, tunnels, connectivity, tunnel-front, aks-link, Konnectivity agent, Cluster Proportional Autoscaler, CPA, Resource allocation, Performance bottlenecks, Networking reliability, Azure Kubernetes troubleshooting, AKS performance issues
88
#Customer intent: As an Azure Kubernetes user, I want to avoid tunnel connectivity issues so that I can use an Azure Kubernetes Service (AKS) cluster successfully.
99
ms.custom: sap:Connectivity
1010
---
@@ -251,6 +251,80 @@ If everything is OK within the application, you'll have to adjust the allocated
251251

252252
You can set up a new cluster to use a Managed Network Address Translation (NAT) Gateway for outbound connections. For more information, see [Create an AKS cluster with a Managed NAT Gateway](/azure/aks/nat-gateway#create-an-aks-cluster-with-a-managed-nat-gateway).
253253

254+
## Cause 6: Konnectivity Agents performance issues with Cluster growth
255+
256+
As the cluster grows, the performance of Konnectivity Agents might degrade because of increased network traffic, more requests, or resource constraints.
257+
258+
> [!NOTE]
259+
> This cause applies to only the `Konnectivity-agent` pods.
260+
261+
### Solution 6: Cluster Proportional Autoscaler for Konnectivity Agent
262+
263+
To manage scalability challenges in large clusters, we implement the Cluster Proportional Autoscaler for our Konnectivity Agents. This approach aligns with industry standards and best practices. It ensures optimal resource usage and enhanced performance.
264+
265+
**Why this change was made**
266+
Previously, the Konnectivity agent had a fixed replica count that could create a bottleneck as the cluster grew. By implementating the Cluster Proportional Autoscaler, we enable the replica count to adjust dynamically, based on node-scaling rules, to provide optimal performance and resource usage.
267+
268+
**How the Cluster Proportional Autoscaler works**
269+
The Cluster Proportional Autoscaler work uses a ladder configuration to determine the number of Konnectivity agent replicas based on the cluster size. The ladder configuration is defined in the konnectivity-agent-autoscaler configmap in the kube-system namespace. Here is an example of the ladder configuration:
270+
271+
```
272+
nodesToReplicas": [
273+
[1, 2],
274+
[100, 3],
275+
[250, 4],
276+
[500, 5],
277+
[1000, 6],
278+
[5000, 10]
279+
]
280+
```
281+
282+
This configuration makes sure that the number of replicas scales appropriately with the number of nodes in the cluster to provide optimal resource allocation and improved networking reliability.
283+
284+
**How to use the Cluster Proportional Autoscaler?**
285+
You can override default values by updating the konnectivity-agent-autoscaler configmap in the kube-system namespace. Here is a sample command to update the configmap:
286+
287+
```bash
288+
kubectl edit configmap <pod-name> -n kube-system
289+
```
290+
This command opens the configmap in an editor to enable you to make the necessary changes.
291+
292+
**What you should check**
293+
294+
You have to monitor for Out Of Memory (OOM) kills on the nodes because misconfiguration of the Cluster Proportional Autoscaler can cause insufficient memory allocation for the Konnectivity agents. This misconfiguration occurs for the following key reasons:
295+
296+
**High Memory Usage:** As the cluster grows, the memory usage of Konnectivity agents can increase significantly. This increase can occur especially during peak loads or when handling large numbers of connections. If the Cluster Proportional Autoscaler configuration does not scale the replicas appropriately, the agents may run out of memory.
297+
298+
**Fixed Resource Limits:** If the resource requests and limits for the Konnectivity agents are set too low, they might not have enough memory to handle the workload, leading to OOM kills. Misconfigured Cluster Proportional Autoscaler settings can exacerbate this issue by not providing enough replicas to distribute the load.
299+
300+
**Cluster Size and Workload Variability:** The CPU and memory that are needed by the Konnectivity agents can vary widely depending on the size of the cluster and the workload. If the Cluster Proportional Autoscaler ladder configuration is not right-sized and adaptively resized for the cluster's usage patterns, it can cause memory overcommitment and OOM kills.
301+
302+
To identify and troubleshoot OOM kills, follow these steps:
303+
304+
1. Check for OOM Kills on nodes: Use the following command to check for OOM Kills on your nodes:
305+
306+
```
307+
kubectl get events --all-namespaces | grep -i 'oomkill'
308+
```
309+
310+
2. Inspect Node Resource Usage: Verify the resource usage on your nodes to make sure that they aren't running out of memory:
311+
312+
```
313+
kubectl top nodes
314+
```
315+
316+
3. Review Pod Resource Requests and Limits: Make sure that the Konnectivity agent pods have appropriate resource requests and limits set to prevent OOM Kills:
317+
318+
```
319+
kubectl get pod <pod-name> -n kube-system -o yaml | grep -A5 "resources:"
320+
```
321+
322+
4. Adjust Resource Requests and Limits: If necessary, adjust the resource requests and limits for the Konnectivity agent pods by editing the deployment:
323+
324+
```
325+
kubectl edit deployment konnectivity-agent -n kube-system
326+
```
327+
254328
[!INCLUDE [Third-party contact disclaimer](../../../includes/third-party-contact-disclaimer.md)]
255329

256330
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]

support/azure/azure-kubernetes/create-upgrade-delete/error-code-requestdisallowedbypolicy.md

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
2-
title: RequestDisallowedByPolicy error when deploying an AKS cluster
2+
title: RequestDisallowedByPolicy Error When Deploying an AKS Cluster
33
description: Learn how to fix the RequestDisallowedByPolicy error when you try to create and deploy an Azure Kubernetes Service (AKS) cluster.
4-
ms.date: 10/12/2024
4+
ms.date: 03/13/2025
55
editor: v-jsitser
6-
ms.reviewer: rissing, chiragpa, erbookbi, albarqaw, v-leedennis, v-weizhu
6+
ms.reviewer: rissing, chiragpa, erbookbi, albarqaw, jacobbaek, v-leedennis, v-weizhu
77
ms.service: azure-kubernetes-service
8-
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the RequestDisallowedByPolicy error code so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
8+
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the RequestDisallowedByPolicy error so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
99
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
1010
---
1111
# RequestDisallowedByPolicy error when deploying an AKS cluster
@@ -22,24 +22,29 @@ When you try to deploy an AKS cluster, you receive the following error message:
2222
2323
## Cause
2424

25-
For security or compliance, your subscription administrators might assign policies that limit how resources are deployed. For example, your subscription might have a policy that prevents creating public IP addresses, network security groups, user-defined routes, or route tables. The error message includes the specific reason why the cluster creation was blocked. Only you can manage the policies in your environment. Microsoft can't disable or bypass those policies.
25+
For security or compliance, your subscription administrators might assign policies that limit how resources are deployed. For example, your subscription might have a policy that prevents you from creating public IP addresses, network security groups, user-defined routes, or route tables. The error message includes the specific reason why the cluster creation was blocked.
26+
27+
> [!NOTE]
28+
> Only you can manage the policies in your environment. Microsoft can't disable or bypass those policies.
2629
2730
## Solution
2831

2932
To fix this issue, follow these steps:
3033

31-
1. Find the policy that blocks the action. These policies are listed in the error message. The name of a policy assignment or definition is the last segment of the `id` string shown in the error message.
32-
33-
1. If possible, change your deployment to meet the limitations of the policy, and then retry the deploy operation.
34-
35-
1. Add an [exception to the policy](/azure/governance/policy/concepts/exemption-structure).
34+
1. Find the policy that blocks the action. These policies are listed in the error message.
35+
The name of a policy assignment or definition is the last segment of the `id` string that's shown in the error message.
36+
```
37+
# Example
38+
Code: RequestDisallowedByPolicy
39+
Message: Resource 'resourcegroup' was disallowed by policy. Policy identifiers: '[{"policyAssignment":{"name":"Not allowed resource types","id":"/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/policyAssignments/00000000000000000000000"},"policyDefinition":{"name":"Not allowed resource types","id":"/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/policyDefinitions/not-allowed-resourcetypes","version":"1.0.0"}}]'.
40+
```
3641

37-
1. [Disable the policy](/azure/defender-for-cloud/tutorial-security-policy#disable-security-policies-and-disable-recommendations).
42+
1. If possible, update your deployment to comply with the policy restrictions, and then retry the deployment. Alternatively, if you have permission to update policy, [add an exemption](/azure/governance/policy/tutorials/disallowed-resources#create-an-exemption) to the policy.
3843

39-
To get details about the policy that blocked your cluster deployment operation, see [RequestDisallowedByPolicy error with Azure resource policy](/azure/azure-resource-manager/troubleshooting/error-policy-requestdisallowedbypolicy).
44+
To get details about the policy that blocked your cluster deployment, see [RequestDisallowedByPolicy error with Azure resource policy](/azure/azure-resource-manager/troubleshooting/error-policy-requestdisallowedbypolicy).
4045

4146
> [!NOTE]
42-
> After fixing the policy that blocks the AKS cluster creation, run the `az aks update -g MyResourceGroup -n MyManagedCluster` command to change the cluster from a failed to a success state. This will reconcile the cluster and retry the last failed operation. For more information about clusters in a failed state, see [Troubleshoot Azure Kubernetes Service clusters or nodes in a failed state](../availability-performance/cluster-node-virtual-machine-failed-state.md).
47+
> After you fix the policy that blocks the AKS cluster creation, run the `az aks update -g MyResourceGroup -n MyManagedCluster` command to change the cluster from a failed state to a successful state. This change reconciles the cluster and retries the last failed operation. For more information about clusters in a failed state, see [Troubleshoot Azure Kubernetes Service clusters or nodes in a failed state](../availability-performance/cluster-node-virtual-machine-failed-state.md).
4348
4449
## More information
4550

support/azure/azure-storage/files/performance/files-troubleshoot-performance.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -239,17 +239,34 @@ It's possible you're experiencing throttling, and your requests are being sent t
239239

240240
Ensure your app is within the [Azure Files scale targets](/azure/storage/files/storage-files-scale-targets#azure-files-scale-targets). If you're using standard Azure file shares, consider switching to premium.
241241

242+
### Cause 3: Azure File Share reaches capacity
243+
244+
If the Azure file share is close to reaching its capacity, an important step is to identify the largest files and directories in order to optimize storage. This step helps you to understand which files and folders are using the most space.
245+
246+
### Workaround
247+
248+
To get a comprehensive view of storage usage across the entire share, mount the root of the share. This action enables a thorough inspection of file and directory sizes. At the root of the file share, run the following commands to identify the largest files and directories:
249+
250+
```bash
251+
cd /path/to/mount/point
252+
du -ah --max-depth=1 | sort -rh | head -n 20
253+
```
254+
255+
This command displays the 20 largest files and directories in descending order of size. This display provides a clear overview of storage consumption.
256+
257+
If you can't mount the root of the share, use Azure Storage Explorer or a third-party tool to analyze storage usage. These tools provide similar insights into file and directory sizes without requiring you to mount the share.
258+
242259
## Throughput on Linux clients is lower than that of Windows clients
243260

244261
### Cause
245262

246-
This is a known issue with implementing the SMB client on Linux.
263+
This is a known issue that affects implementing the SMB client on Linux.
247264

248265
### Workaround
249266

250267
- Spread the load across multiple VMs.
251-
- On the same VM, use multiple mount points with a `nosharesock` option, and spread the load across these mount points.
252-
- On Linux, try mounting with a `nostrictsync` option to avoid forcing an SMB flush on every `fsync` call. For Azure Files, this option doesn't interfere with data consistency, but it might result in stale file metadata on directory listings (`ls -l` command). Directly querying file metadata by using the `stat` command will return the most up-to-date file metadata.
268+
- On the same VM, use multiple mount points that have a `nosharesock` option, and spread the load across these mount points.
269+
- On Linux, try mounting by using a `nostrictsync` option to avoid forcing an SMB flush on every `fsync` call. For Azure Files, this option doesn't interfere with data consistency, but it might cause stale file metadata on directory listings (`ls -l` command). Directly querying file metadata by using the `stat` command returns the most up-to-date file metadata.
253270

254271
## High latencies for metadata-heavy workloads involving extensive open/close operations
255272

support/windows-server/active-directory/transfer-or-seize-operation-master-roles-in-ad-ds.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Transfer or seize Operation Master roles
33
description: Describes how you can use the Ntdsutil.exe utility to move or to seize Operation Master roles, formerly known as Flexible Single Master Operations (FSMO) roles.
4-
ms.date: 01/15/2025
4+
ms.date: 04/07/2025
55
manager: dcscontentpm
66
audience: ITPro
77
ms.topic: troubleshooting
@@ -35,7 +35,9 @@ For more information about the Operation Master role holders and recommendations
3535
3636
When a DC that has been acting as a role holder starts to run (for example, after a failure or a shutdown), it doesn't immediately resume behaving as the role holder. The DC waits until it receives inbound replication for its naming context (for example, the Schema master role owner waits to receive inbound replication of the Schema partition).
3737

38-
The information that the DCs pass as part of Active Directory replication includes the identities of the current Operation Master role holders. When the newly started DC receives the inbound replication information, it verifies whether it's still the role holder. If it is, it resumes typical operations. If the replicated information indicates that another DC is acting as the role holder, the newly started DC relinquishes its role ownership. This behavior reduces the chance that the domain or forest will have duplicate Operation Master role holders.
38+
The information that the DCs pass as part of Active Directory replication includes the identities of the current Operation Master role holders. When the newly started DC receives the inbound replication information, it verifies whether it's still the role holder. The Active Directory Replication Engine resolves any potentially conflicting changes. For more information, see [Resolving conflicting changes](/previous-versions/windows/it-pro/windows-server-2003/cc736978(v=ws.10)#resolving-conflicting-changes).
39+
40+
If the DC is the current Operations Master, it resumes typical operations. If the replicated information indicates that another DC is acting as the role holder, the newly started DC relinquishes its role ownership. This behavior reduces the chance that the domain or forest have duplicate Operation Master role holders.
3941

4042
> [!IMPORTANT]
4143
> AD FS operations fail if they require a role holder and if the newly started role holder is, in fact, the role holder and it doesn't receive inbound replication.
@@ -199,8 +201,8 @@ If it's possible, and if you're able to transfer the roles instead of seizing th
199201
When part of a domain or forest can't communicate with the rest of the domain or forest for an extended time, the isolated sections of domain or forest are known as replication islands. DCs in one island can't replicate with the DCs in other islands. Over multiple replication cycles, the replication islands fall out of sync. If each island has its own Operation Master role holders, you may have problems when you restore communication between the islands.
200202
201203
> [!IMPORTANT]
202-
> In most cases, you can take advantage of the initial replication requirement (as described in this article) to weed out duplicate role holders. A restarted role holder should relinquish the role if it detects a duplicate role-holder.
203-
> You may encounter circumstances that this behavior does not resolve. In such cases, the information in this section may be helpful.
204+
> In most cases, you can take advantage of the initial replication requirement (as described in this article) to weed out duplicate role holders. A restarted role holder will relinquish the role if it detects a duplicate role-holder through updates it receives on inbound replication.
205+
> You may encounter circumstances that this behavior does not resolve the Operations Master conflict. In such cases, the information in this section may be helpful.
204206
205207
The following table identifies the Operation Master roles that can cause problems if a forest or domain has multiple role-holders for that role:
206208

0 commit comments

Comments
 (0)