You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/storage/container-storage/troubleshoot-container-storage.md
+11-17Lines changed: 11 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ ms.topic: how-to
18
18
19
19
After running `az aks create`, you might see the message *Azure Container Storage failed to install. AKS cluster is created. Please run `az aks update` along with `--enable-azure-container-storage` to enable Azure Container Storage*.
20
20
21
-
This message means that Azure Container Storage wasn't installed, but your AKS cluster was created properly.
21
+
This message means that Azure Container Storage wasn't installed, but your AKS (Azure Kubernetes Service) cluster was created properly.
22
22
23
23
To install Azure Container Storage on the cluster and create a storage pool, run the following command. Replace `<cluster-name>` and `<resource-group>` with your own values. Replace `<storage-pool-type>` with `azureDisk`, `ephemeraldisk`, or `elasticSan`.
24
24
@@ -28,7 +28,7 @@ az aks update -n <cluster-name> -g <resource-group> --enable-azure-container-sto
28
28
29
29
### Azure Container Storage fails to install due to Azure Policy restrictions
30
30
31
-
Azure Container Storage might fail to install if Azure Policy restrictions are in place. Specifically, Azure Container Storage relies on privileged containers, which can be blocked by Azure Policy. When this happens, the installation of Azure Container Storage might timeout or fail, and you might see errors in the `gatekeeper-controller` logs such as:
31
+
Azure Container Storage might fail to install if Azure Policy restrictions are in place. Specifically, Azure Container Storage relies on privileged containers, which can be blocked by Azure Policy. When they are blocked, the installation of Azure Container Storage might time out or fail, and you might see errors in the `gatekeeper-controller` logs such as:
{"level":"info","ts":1722622449.2412128,"logger":"webhook","msg":"denied admission: Privileged container is not allowed: ndm, securityContext: {\"privileged\": true}","hookType":"validation","process":"admission","details":{},"event_type":"violation","constraint_name":"azurepolicy-k8sazurev2noprivilege-686dd8b209a774ba977c","constraint_group":"constraints.gatekeeper.sh","constraint_api_version":"v1beta1","constraint_kind":"K8sAzureV2NoPrivilege","constraint_action":"deny","resource_group":"","resource_api_version":"v1","resource_kind":"Pod","resource_namespace":"acstor","resource_name":"azurecontainerstorage-ndm-b5nfg","request_username":"system:serviceaccount:kube-system:daemon-set-controller"}
42
42
```
43
43
44
-
To resolve this, you’ll need to add the `acstor` namespace to the exclusion list of your Azure Policy. Azure Policy is used to create and enforce rules for managing resources within Azure, including AKS clusters. In some cases, policies might block the creation of Azure Container Storage pods and components. You can find more details on working with Azure Policy for Kubernetes by consulting [Azure Policy for Kubernetes](/azure/governance/policy/concepts/policy-for-kubernetes).
44
+
To resolve the blocking, you need to add the `acstor` namespace to the exclusion list of your Azure Policy. Azure Policy is used to create and enforce rules for managing resources within Azure, including AKS clusters. In some cases, policies might block the creation of Azure Container Storage pods and components. You can find more details on working with Azure Policy for Kubernetes by consulting [Azure Policy for Kubernetes](/azure/governance/policy/concepts/policy-for-kubernetes).
45
45
46
46
To add the `acstor` namespace to the exclusion list, follow these steps:
47
47
@@ -55,7 +55,7 @@ To add the `acstor` namespace to the exclusion list, follow these steps:
55
55
56
56
### Can't install and enable Azure Container Storage in node pools with taints
57
57
58
-
You may have configured [node taints](/azure/aks/use-node-taints) on the node pools to retrict pods from being scheduled on these node pools. When you try to install and enable Azure Container Storage on these noode pools, it will be blocked because the required pods can't be created in these node pools. This applies to both the system node pool when installing and the user node pools when enabling.
58
+
You may have configured [node taints](/azure/aks/use-node-taints) on the node pools to restrict pods from being scheduled on these node pools. When you install and enable Azure Container Storage on these noode pools, it will be blocked because the required pods can't be created in these node pools. The behavior applies to both the system node pool when installing and the user node pools when enabling.
59
59
60
60
You can check the node taints with the following example:
61
61
@@ -89,7 +89,7 @@ $ az aks nodepool list -g $resourceGroup --cluster-name $clusterName --query "[]
89
89
90
90
```
91
91
92
-
Retry the installing or enabling after you remove node taints successfully. After it's commpleted successfully, you can configure these node taints back to resume the pod scheduling restaints.
92
+
Retry the installing or enabling after you remove node taints successfully. After it's completed successfully, you can configure node taints back to resume the pod scheduling restaints.
93
93
94
94
### Can't set storage pool type to NVMe
95
95
@@ -103,15 +103,15 @@ To check the status of your storage pools, run `kubectl describe sp <storage-poo
103
103
104
104
### Error when trying to expand an Azure Disks storage pool
105
105
106
-
If your existing storage pool is less than 4 TiB (4,096 GiB), you can only expand it up to 4,095 GiB. If you try to expand beyond that, the internal PVC will get an error message like "Only Disk CachingType 'None' is supported for disk with size greater than 4095 GB" or ""Disk 'xxx' of size 4096 GB (<=4096 GB) cannot be resized to 16384 GB (>4096 GB) while it is attached to a running VM. Please stop your VM or detach the disk and retry the operation."
106
+
If your existing storage pool is less than 4 TiB (4,096 GiB), you can only expand it up to 4,095 GiB. If you try to expand beyond that, the internal PVC will get an error message like "Only Disk CachingType 'None' is supported for disk with size greater than 4095 GB" or "Disk 'xxx' of size 4096 GB (<=4096 GB) cannot be resized to 16384 GB (>4096 GB) while it is attached to a running VM. Please stop your VM or detach the disk and retry the operation."
107
107
108
108
To avoid errors, don't attempt to expand your current storage pool beyond 4,095 GiB if it is initially smaller than 4 TiB (4,096 GiB). Storage pools larger than 4 TiB can be expanded up to the maximum storage capacity available.
109
109
110
110
This limitation only applies when using `Premium_LRS`, `Standard_LRS`, `StandardSSD_LRS`, `Premium_ZRS`, and `StandardSSD_ZRS` Disk SKUs.
111
111
112
112
### Elastic SAN creation fails
113
113
114
-
If you're trying to create an Elastic SAN storage pool, you might see the message *Azure Elastic SAN creation failed: Maximum possible number of Elastic SAN for the Subscription created already*. This means that you've reached the limit on the number of Elastic SAN resources that can be deployed in a region per subscription. You can check the limit here: [Elastic SAN scalability and performance targets](../elastic-san/elastic-san-scale-targets.md#elastic-san-scale-targets). Consider deleting any existing Elastic SAN resources on the subscription that are no longer being used, or try creating the storage pool in a different region.
114
+
If you're trying to create an Elastic SAN storage pool, you might see the message *Azure Elastic SAN creation failed: Maximum possible number of Elastic SAN for the Subscription created already*. This means that you reach the limit on the number of Elastic SAN resources that can be deployed in a region per subscription. You can check the limit here: [Elastic SAN scalability and performance targets](../elastic-san/elastic-san-scale-targets.md#elastic-san-scale-targets). Consider deleting any existing Elastic SAN resources on the subscription that are no longer being used, or try creating the storage pool in a different region.
115
115
116
116
### No block devices found
117
117
@@ -131,12 +131,6 @@ When disabling a storage pool type via `az aks update --disable-azure-container-
131
131
132
132
If you select Y, an automatic validation runs to ensure that there are no persistent volumes created from the storage pool. Selecting n bypasses this validation and disables the storage pool type, deleting any existing storage pools and potentially affecting your application.
133
133
134
-
### Can't delete resource group containing AKS cluster
135
-
136
-
If you created an Elastic SAN storage pool, you might not be able to delete the resource group in which your AKS cluster is located.
137
-
138
-
To resolve this, sign in to the [Azure portal](https://portal.azure.com?azure-portal=true) and select **Resource groups**. Locate the resource group that AKS created (the resource group name starts with **MC_**). Select the SAN resource object within that resource group. Manually remove all volumes and volume groups. Then retry deleting the resource group that includes your AKS cluster.
139
-
140
134
## Troubleshoot volume issues
141
135
142
136
### Pod pending creation due to ephemeral volume size above available capacity
In this example, the available capacity of temp disk for a single node is `75031990272` bytes or 69 GiB.
190
184
191
-
Adjust the volume storage size below available capacity and re-deploy your pod. See [Deploy a pod with a generic ephemeral volume](use-container-storage-with-temp-ssd.md#3-deploy-a-pod-with-a-generic-ephemeral-volume).
185
+
Adjust the volume storage size below available capacity and redeploy your pod. See [Deploy a pod with a generic ephemeral volume](use-container-storage-with-temp-ssd.md#3-deploy-a-pod-with-a-generic-ephemeral-volume).
192
186
193
187
### Volume fails to attach due to metadata store offline
194
188
195
-
Azure Container Storage uses `etcd`, a distributed, reliable key-value store, to store and manage metadata of volumes to support volume orchestration operations. For high availability and resiliency, `etcd` runs in three pods. When there are less than two `etcd` instances running, Azure Container Storage will halt volume orchestration operations while still allowing data access to the volumes. Azure Container Storage automatically detects when an `etcd` instance is offline and recovers it. However, if you notice volume orchestration errors after restarting an AKS cluster, it's possible that an `etcd` instance failed to auto-recover. Follow the instructions in this section to determine the health status of the `etcd` instances.
189
+
Azure Container Storage uses `etcd`, a distributed, reliable key-value store, to store and manage metadata of volumes to support volume orchestration operations. For high availability and resiliency, `etcd` runs in three pods. When there are less than two `etcd` instances running, Azure Container Storage will halt volume orchestration operations while still allowing data access to the volumes. Azure Container Storage automatically detects when an `etcd` instance is offline and recovers it. However, if you notice volume orchestration errors after restarting an AKS cluster, it's possible that an `etcd` instance failed to autorecover. Follow the instructions in this section to determine the health status of the `etcd` instances.
196
190
197
191
Run the following command to get a list of pods.
198
192
@@ -213,7 +207,7 @@ Describe the pod:
213
207
kubectl describe pod fiopod
214
208
```
215
209
216
-
Typically, you'll see volume failure messages if the metadata store is offline. In this example, **fiopod** is in **ContainerCreating** status, and the **FailedAttachVolume** warning indicates that the creation is pending due to volume attach failure.
210
+
Typically, you see volume failure messages if the metadata store is offline. In this example, **fiopod** is in **ContainerCreating** status, and the **FailedAttachVolume** warning indicates that the creation is pending due to volume attach failure.
If fewer than two instances are shown in the Running state, you can conclude that the volume is failing to attach due to the metadata store being offline, and the automated recovery wasn't successful. If this is the case, file a support ticket with [Azure Support](https://azure.microsoft.com/support/).
240
+
If fewer than two instances are shown in the Running state, you can conclude that the volume is failing to attach due to the metadata store being offline, and the automated recovery wasn't successful. If so, file a support ticket with [Azure Support](https://azure.microsoft.com/support/).
0 commit comments