Skip to content

Commit 8fb6b7a

Browse files
Merge pull request #3588 from MicrosoftDocs/main638793147027039486sync_temp
For protected branch, push strategy should use PR and merge to target branch method to work around git push error
2 parents ef8d7e9 + 4093912 commit 8fb6b7a

File tree

8 files changed

+253
-13
lines changed

8 files changed

+253
-13
lines changed

AKS-Arc/TOC.yml

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -157,18 +157,25 @@
157157
href: aks-arc-diagnostic-checker.md
158158
- name: KubeAPIServer unreachable error
159159
href: kube-api-server-unreachable.md
160-
- name: Deleted AKS Arc cluster still visible on Azure portal
161-
href: deleted-cluster-visible.md
160+
- name: Can't create/scale AKS cluster due to image issues
161+
href: gallery-image-not-usable.md
162+
- name: Disk space exhaustion on control plane VMs
163+
href: kube-apiserver-log-overflow.md
164+
- name: Telemetry pod consumes too much memory and CPU
165+
href: telemetry-pod-resources.md
166+
- name: Issues after deleting storage volumes
167+
href: delete-storage-volume.md
162168
- name: Can't fully delete AKS Arc cluster with PodDisruptionBudget (PDB) resources
163169
href: delete-cluster-pdb.md
170+
- name: Azure Advisor upgrade recommendation
171+
href: azure-advisor-upgrade.md
172+
- name: Deleted AKS Arc cluster still visible on Azure portal
173+
href: deleted-cluster-visible.md
164174
- name: Can't see VM SKUs on Azure portal
165175
href: check-vm-sku.md
166176
- name: Connectivity issues with MetalLB
167177
href: load-balancer-issues.md
168-
- name: Azure Advisor upgrade recommendation
169-
href: azure-advisor-upgrade.md
170-
- name: Issues after deleting storage volumes
171-
href: delete-storage-volume.md
178+
172179
- name: Reference
173180
items:
174181
- name: Azure CLI

AKS-Arc/aks-troubleshoot.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: sethmanheim
66
ms.date: 04/01/2025
77
ms.author: sethm
88
ms.lastreviewed: 04/01/2025
9-
ms.reviewer: guanghu
9+
ms.reviewer: abha
1010

1111
---
1212

@@ -24,6 +24,9 @@ The following sections describe known issues for AKS enabled by Azure Arc:
2424

2525
| AKS Arc CRUD operation | Issue | Fix status |
2626
|------------------------|-------|------------|
27+
| AKS cluster create | [Can't create AKS cluster or scale node pool because of issues with AKS Arc images](gallery-image-not-usable.md) | Partially fixed in 2503 release |
28+
| AKS steady state | [AKS Arc telemetry pod consumes too much memory and CPU](telemetry-pod-resources.md) | Active
29+
| AKS steady state | [Disk space exhaustion on control plane VMs due to accumulation of kube-apiserver audit logs](kube-apiserver-log-overflow.md) | Active
2730
| AKS cluster delete | [Deleted AKS Arc cluster still visible on Azure portal](deleted-cluster-visible.md) | Active |
2831
| AKS cluster delete | [Can't fully delete AKS Arc cluster with PodDisruptionBudget (PDB) resources](delete-cluster-pdb.md) | Fixed in 2503 release |
2932
| Azure portal | [Can't see VM SKUs on Azure portal](check-vm-sku.md) | Fixed in 2411 release |
@@ -38,7 +41,7 @@ The following sections describe known issues for AKS enabled by Azure Arc:
3841
| Create validation | [K8sVersionValidation error](cluster-k8s-version.md)
3942
| Create validation | [KubeAPIServer unreachable error](kube-api-server-unreachable.md)
4043
| Network configuration issues | [Use diagnostic checker](aks-arc-diagnostic-checker.md)
41-
| Kubernetes steady state | [Issues after deleting storage volume](delete-storage-volume.md)
44+
| Kubernetes steady state | [Resolve issues due to out-of-band deletion of storage volumes](delete-storage-volume.md)
4245
| Release validation | [Azure Advisor upgrade recommendation message](azure-advisor-upgrade.md)
4346

4447
## Next steps

AKS-Arc/delete-cluster-pdb.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,11 @@ When you delete an AKS Arc cluster that has [PodDisruptionBudget](https://kubern
1919

2020
This issue was fixed in [AKS on Azure Local, version 2503](aks-whats-new-23h2.md#release-2503).
2121

22-
If you're on an older build, please update to Azure Local, version 2503. Once you update to 2503, you can retry deleting the AKS cluster. If the retry doesn't work, follow this workaround. File a support case if the retry does not delete the AKS cluster.
22+
- **For deleting an AKS cluster** with a PodDisruptionBudget: If you're on an older build, please update to Azure Local, version 2503. Once you update to 2503, you can retry deleting the AKS cluster. File a support case if you're on the 2503 release and your AKS cluster is not deleted after at least one retry.
23+
- **For deleting a nodepool** with a PodDisruptionBudget: By design, the nodepool isn't deleted if a PodDisruptionBudget exists, to protect applications. Use the following workaround to delete the PDB resources and then retry deleting the nodepool.
2324

24-
## Workaround for AKS Edge Essentials and prior versions of AKS on Azure Local.
25+
26+
## Workaround for AKS Edge Essentials and older versions of AKS on Azure Local
2527

2628
Before you delete the AKS Arc cluster, access the AKS Arc cluster's **kubeconfig** and delete all PDBs:
2729

AKS-Arc/gallery-image-not-usable.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
title: Kubernetes cluster create or nodepool scale failing due to AKS Arc image issues
3+
description: Learn about a known issue with Kubernetes cluster create or nodepool scale failing due to AKS Arc VHD image download issues.
4+
ms.topic: troubleshooting
5+
author: sethmanheim
6+
ms.author: sethm
7+
ms.date: 04/01/2025
8+
ms.reviewer: abha
9+
10+
---
11+
12+
# Can't create AKS cluster or scale node pool because of issues with AKS Arc images
13+
14+
[!INCLUDE [hci-applies-to-23h2](includes/hci-applies-to-23h2.md)]
15+
16+
## Symptoms
17+
18+
You see the following error when you try to create the AKS cluster:
19+
20+
```output
21+
Kubernetes version 1.29.4 is not ready for use on Linux. Please go to https://aka.ms/aksarccheckk8sversions for details of how to check the readiness of Kubernetes versions.
22+
```
23+
24+
You might also see the following error when you try to scale a nodepool:
25+
26+
```output
27+
error with code NodepoolPrecheckFailed occured: AksHci nodepool creation precheck failed. Detailed message: 1 error occurred:\n\t* rpc error: code = Unknown desc = GalleryImage not usable, health state degraded: Degraded
28+
```
29+
30+
When you run `az aksarc get-versions`, you see the following errors:
31+
32+
```output
33+
...
34+
              {
35+
36+
                "errorMessage": "failed cloud-side provisioning image linux-cblmariner-0.4.1.11203 to cloud gallery: {\n  \"code\": \"ImageProvisionError\",\n  \"message\": \"force failed to deprovision existing gallery image: failed to delete gallery image linux-cblmariner-0.4.1.11203: rpc error: code = Unknown desc = sa659p1012: rpc error: code = Unavailable desc = connection error: desc = \\\"transport: Error while dialing: dial tcp 10.202.244.4:45000: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.\\\"\",\n  \"additionalInfo\": [\n   {\n    \"type\": \"providerImageProvisionInfo\",\n    \"info\": {\n     \"ProviderDownload\": \"True\"\n    }\n   }\n  ],\n  \"category\": \"\"\n }",
37+
                "osSku": "CBLMariner",
38+
                "osType": "Linux",
39+
                "ready": false
40+
              },
41+
...
42+
```
43+
44+
## Mitigation
45+
46+
- This issue was fixed in [AKS on Azure Local, version 2503](aks-whats-new-23h2.md#release-2503).
47+
- Upgrade your Azure Local deployment to the 2503 build.
48+
- Once updated, confirm that the images have been downloaded successfully by running the `az aksarc get-versions` command.
49+
- For new AKS clusters: new AKS clusters should now be created successfully.
50+
- For scaling existing AKS clusters: scaling existing AKS clusters continues to encounter issues. Please file a support case.
51+
52+
## Next steps
53+
54+
[Known issues in AKS enabled by Azure Arc](aks-known-issues.md)
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
title: Disk space exhaustion on the control plane VMs due to accumulation of kube-apiserver audit logs
3+
description: Learn about a known issue with disk space exhaustion on the control plane VMs due to accumulation of kube-apiserver audit logs.
4+
ms.topic: troubleshooting
5+
author: sethmanheim
6+
ms.author: sethm
7+
ms.date: 04/01/2025
8+
ms.reviewer: abha
9+
10+
---
11+
12+
# Disk space exhaustion on control plane VMs due to accumulation of kube-apiserver audit logs
13+
14+
[!INCLUDE [hci-applies-to-23h2](includes/hci-applies-to-23h2.md)]
15+
16+
## Symptoms
17+
18+
If you're running kubectl commands and facing issues, you might see errors such as:
19+
20+
```output
21+
kubectl get ns
22+
Error from server (InternalError): an error on the server ("Internal Server Error: \"/api/v1/namespaces?limit=500\": unknown") has prevented the request from succeeding (get namespaces)
23+
```
24+
25+
When you SSH into the control plane VM, you might notice that your control plane VM ran out of disk space, specifically on the **/dev/sda2** partition. This is due to the accumulation of kube-apiserver audit logs in the **/var/log/kube-apiserver** directory, which can consume approximately 90 GB of disk space.
26+
27+
```output
28+
clouduser@moc-laiwyj6tly6 [ /var/log/kube-apiserver ]$ df -h
29+
Filesystem      Size  Used Avail Use% Mounted on
30+
devtmpfs        4.0M     0  4.0M   0% /dev
31+
tmpfs           3.8G   84K  3.8G   1% /dev/shm
32+
tmpfs           1.6G  179M  1.4G  12% /run
33+
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
34+
/dev/sda2        99G   99G     0 100% /
35+
tmpfs           3.8G     0  3.8G   0% /tmp
36+
tmpfs           769M     0  769M   0% /run/user/1002
37+
clouduser@moc-laiwyj6tly6 [ /var/log/kube-apiserver ]$ sudo ls -l /var/log/kube-apiserver|wc -l
38+
890
39+
clouduser@moc-laiwyj6tly6 [ /var/log/kube-apiserver ]$ sudo du -h /var/log/kube-apiserver
40+
87G     /var/log/kube-apiserver
41+
```
42+
43+
The issue occurs because the `--audit-log-maxbackup` value is set to 0. This setting allows the audit logs to accumulate without any limit, eventually filling up the disk.
44+
45+
## Mitigation
46+
47+
To resolve the issue temporarily, you must manually clean up the old audit logs. Follow these steps:
48+
49+
- SSH into the control plane virtual machine (VM) of your AKS Arc cluster.
50+
- Remove the old audit logs from the **/var/log/kube-apiserver** folder.
51+
- If you have multiple control plane nodes, you must repeat this process on each control plane VM.
52+
53+
[SSH into the control plane VM](ssh-connect-to-windows-and-linux-worker-nodes.md) and navigate to the kube-apiserver logs directory:
54+
55+
```bash
56+
cd /var/log/kube-apiserver
57+
```
58+
59+
Remove the old audit log files:
60+
61+
```bash
62+
rm audit-*.log
63+
```
64+
65+
Exit the SSH session:
66+
67+
```bash
68+
exit
69+
```
70+
71+
## Next steps
72+
73+
[Known issues in AKS enabled by Azure Arc](aks-known-issues.md)

AKS-Arc/scale-requirements.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ This article describes the maximum and minimum supported scale count for AKS on
6464
| Standard_D4s_v3 | 4 | 16 |
6565
| Standard_D8s_v3 | 8 | 32 |
6666
| Standard_D16s_v3 | 16 | 64 |
67-
| Standard_D8s_v3 | 32 | 128 |
67+
| Standard_D32s_v3 | 32 | 128 |
6868

6969
For more worker node sizes with GPU support, see the next section.
7070

AKS-Arc/telemetry-pod-resources.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
title: AKS Arc telemetry pod consumes too much memory and CPU
3+
description: Learn how to troubleshoot when AKS Arc telemetry pod consumes too much memory and CPU.
4+
ms.topic: troubleshooting
5+
author: sethmanheim
6+
ms.author: sethm
7+
ms.date: 04/01/2025
8+
ms.reviewer: abha
9+
10+
---
11+
12+
# AKS Arc telemetry pod consumes too much memory and CPU
13+
14+
## Symptoms
15+
16+
The **akshci-telemetry** pod in a AKS Arc cluster can over time consume a lot of CPU and memory resources. If metrics are enabled, you can verify the CPU and memory usage using the following `kubectl` command:
17+
18+
```bash
19+
kubectl -n kube-system top pod -l app=akshci-telemetry
20+
```
21+
22+
You might see an output similar to this:
23+
24+
```output
25+
NAME CPU(cores) MEMORY(bytes)
26+
akshci-telemetry-5df56fd5-rjqk4 996m 152Mi
27+
```
28+
29+
## Mitigation
30+
31+
To resolve this issue, set default **resource limits** for the pods in the `kube-system` namespace.
32+
33+
### Important notes
34+
35+
- Verify if you have any pods in the **kube-system** namespace that might require more memory than the default limit setting. If so, adjustments might be needed.
36+
- The **LimitRange** is applied to the **namespace**; in this case, the `kube-system` namespace. The default resource limits also apply to new pods that don't specify their own limits.
37+
- **Existing pods**, including those that already have resource limits, aren't affected.
38+
- **New pods** that don't specify their own resource limits are constrained by the limits set in the next section.
39+
- After you set the resource limits and delete the telemetry pod, the new pod might eventually hit the memory limit and generate **OOM (Out-Of-Memory)** errors. This is a temporary mitigation.
40+
41+
To proceed with setting the resource limits, you can run the following script. While the script uses `az aksarc get-credentials`, you can also use `az connectedk8s proxy` to get the proxy kubeconfig and access the Kubernetes cluster.
42+
43+
### Define the LimitRange YAML to set default CPU and memory limits
44+
45+
```powershell
46+
# Set the $cluster_name and $resource_group of the aksarc cluster
47+
$cluster_name = ""
48+
$resource_group = ""
49+
50+
# Connect to the aksarc cluster
51+
az aksarc get-credentials -n $cluster_name -g $resource_group --admin -f "./kubeconfig-$cluster_name"
52+
53+
$limitRangeYaml = @'
54+
apiVersion: v1
55+
kind: LimitRange
56+
metadata:
57+
name: cpu-mem-resource-constraint
58+
namespace: kube-system
59+
spec:
60+
limits:
61+
- default: # this section defines default limits for containers that haven't specified any limits
62+
cpu: 250m
63+
memory: 250Mi
64+
defaultRequest: # this section defines default requests for containers that haven't specified any requests
65+
cpu: 10m
66+
memory: 20Mi
67+
type: Container
68+
'@
69+
70+
$limitRangeYaml | kubectl apply --kubeconfig "./kubeconfig-$cluster_name" -f -
71+
72+
kubectl get pods -l app=akshci-telemetry -n kube-system --kubeconfig "./kubeconfig-$cluster_name"
73+
kubectl delete pods -l app=akshci-telemetry -n kube-system --kubeconfig "./kubeconfig-$cluster_name"
74+
75+
sleep 5
76+
kubectl get pods -l app=akshci-telemetry -n kube-system --kubeconfig "./kubeconfig-$cluster_name"
77+
```
78+
79+
### Validate if the resource limits were applied correctly
80+
81+
1. Check the resource limits in the pod's YAML configuration:
82+
83+
```powershell
84+
kubectl get pods -l app=akshci-telemetry -n kube-system --kubeconfig "./kubeconfig-$cluster_name" -o yaml
85+
```
86+
87+
1. In the output, verify that the `resources` section includes the limits:
88+
89+
```yaml
90+
resources:
91+
limits:
92+
cpu: 250m
93+
memory: 250Mi
94+
requests:
95+
cpu: 10m
96+
memory: 20Mi
97+
```
98+
99+
## Next steps
100+
101+
[Known issues in AKS enabled by Azure Arc](aks-known-issues.md)

azure-local/whats-new.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ ms.topic: overview
55
author: alkohli
66
ms.author: alkohli
77
ms.service: azure-local
8-
ms.date: 03/31/2025
8+
ms.date: 04/03/2025
99
---
1010

1111
# What's new in Azure Local?
@@ -23,7 +23,7 @@ This is a baseline release with the following features and improvements:
2323

2424
- **Registration and deployment changes**
2525
- **Extension installation**: Extensions are no longer installed during the registration of Azure Local machines. Instead, the extensions are installed in the machine validation step during the Azure Local instance deployment. For more information, see [Register with Arc via console](./deploy/deployment-arc-register-server-permissions.md) and [Deploy via Azure portal](./deploy/deploy-via-portal.md).
26-
- **Register via app**: You can bootstrap your Azure Local machines using the Configurator app. The local UI is now deprecated. For more information, see [Register Azure Local machines using Configurator app](./index.yml).
26+
- **Register via app**: You can bootstrap your Azure Local machines using the Configurator app. The local UI is now deprecated. For more information, see [Register Azure Local machines using Configurator app](./deploy/deployment-arc-register-configurator-app.md).
2727
- Composed image is now supported for Other Equipment Manufacturers (OEMs).
2828
- Several security enhancements were done for the Bootstrap service.
2929
- Service Principal Name (SPN) is deprecated for Arc registration.

0 commit comments

Comments
 (0)