You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/storage/container-storage/troubleshoot-container-storage.md
+76-14Lines changed: 76 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ ms.topic: how-to
16
16
17
17
### Azure Container Storage fails to install due to missing configuration
18
18
19
-
After running `az aks create`, you might see the message *Azure Container Storage failed to install. AKS cluster is created. Please run`az aks update` along with `--enable-azure-container-storage` to enable Azure Container Storage*.
19
+
After running `az aks create`, you might see the message *Azure Container Storage failed to install. AKS cluster is created. Run`az aks update` along with `--enable-azure-container-storage` to enable Azure Container Storage*.
20
20
21
21
This message means that Azure Container Storage wasn't installed, but your AKS (Azure Kubernetes Service) cluster was created properly.
22
22
@@ -28,7 +28,7 @@ az aks update -n <cluster-name> -g <resource-group> --enable-azure-container-sto
28
28
29
29
### Azure Container Storage fails to install due to Azure Policy restrictions
30
30
31
-
Azure Container Storage might fail to install if Azure Policy restrictions are in place. Specifically, Azure Container Storage relies on privileged containers, which can be blocked by Azure Policy. When they are blocked, the installation of Azure Container Storage might time out or fail, and you might see errors in the `gatekeeper-controller` logs such as:
31
+
Azure Container Storage might fail to install if Azure Policy restrictions are in place. Specifically, Azure Container Storage relies on privileged containers. You may configure Azure Policy to block privileged containers. When they're blocked, the installation of Azure Container Storage might time out or fail, and you might see errors in the `gatekeeper-controller` logs such as:
@@ -55,7 +55,7 @@ To add the `acstor` namespace to the exclusion list, follow these steps:
55
55
56
56
### Can't install and enable Azure Container Storage in node pools with taints
57
57
58
-
You may have configured [node taints](/azure/aks/use-node-taints) on the node pools to restrict pods from being scheduled on these node pools. When you install and enable Azure Container Storage on these noode pools, it will be blocked because the required pods can't be created in these node pools. The behavior applies to both the system node pool when installing and the user node pools when enabling.
58
+
You might configure [node taints](/azure/aks/use-node-taints) on the node pools to restrict pods from being scheduled on these node pools. Installing and enabling Azure Container Storage on these node pools may be blocked because the required pods can't be created in these node pools. The behavior applies to both the system node pool when installing and the user node pools when enabling.
59
59
60
60
You can check the node taints with the following example:
61
61
@@ -89,7 +89,7 @@ $ az aks nodepool list -g $resourceGroup --cluster-name $clusterName --query "[]
89
89
90
90
```
91
91
92
-
Retry the installing or enabling after you remove node taints successfully. After it's completed successfully, you can configure node taints back to resume the pod scheduling restaints.
92
+
Retry the installing or enabling after you remove node taints successfully. After it completes successfully, you can configure node taints back to resume the pod scheduling restraints.
93
93
94
94
### Can't set storage pool type to NVMe
95
95
@@ -101,11 +101,73 @@ To remediate, create a node pool with a VM SKU that has NVMe drives and try agai
101
101
102
102
To check the status of your storage pools, run `kubectl describe sp <storage-pool-name> -n acstor`. Here are some issues you might encounter.
103
103
104
+
### Ephemeral storage pool doesn’t claim the capacity when the ephemeral disks are used by other daemonsets
105
+
106
+
Enabling an ephemeral storage pool on a node pool with temp SSD or local NVMe disks might not claim capacity from these disks if other daemonsets are using them.
107
+
108
+
Run the following steps to enable Azure Container Storage to manage these local disks exclusively:
109
+
110
+
1. Run the following command to see the claimed capacity by ephemeral storage pool:
111
+
112
+
```bash
113
+
$ kubectl get sp -A
114
+
NAMESPACE NAME CAPACITY AVAILABLE USED RESERVED READY AGE
115
+
acstor ephemeraldisk-nvme 0 0 0 0 False 82s
116
+
```
117
+
This example shows zero capacity claimed by `ephemeraldisk-nvme` storage pool.
118
+
119
+
1. Run the following command to confirm unclaimed state of these local block devices and check existing file system on the disks:
120
+
```bash
121
+
$ kubectl get bd -A
122
+
NAMESPACE NAME NODENAME SIZE CLAIMSTATE STATUS AGE
123
+
acstor blockdevice-1f7ad8fa32a448eb9768ad8e261312ff aks-nodepoolnvme-38618677-vmss000001 1920383410176 Unclaimed Active 22m
124
+
acstor blockdevice-9c8096fc47cc2b41a2ed07ec17a83527 aks-nodepoolnvme-38618677-vmss000000 1920383410176 Unclaimed Active 23m
This example shows `ephemeraldisk-nvme` storage pool successfully claims the capacity from local NVMe disks on the nodes.
165
+
104
166
### Error when trying to expand an Azure Disks storage pool
105
167
106
-
If your existing storage pool is less than 4 TiB (4,096 GiB), you can only expand it up to 4,095 GiB. If you try to expand beyond that, the internal PVC will get an error message like "Only Disk CachingType 'None' is supported for disk with size greater than 4095 GB" or "Disk 'xxx' of size 4096 GB (<=4096 GB) cannot be resized to 16384 GB (>4096 GB) while it is attached to a running VM. Please stop your VM or detach the disk and retry the operation."
168
+
If your existing storage pool is less than 4 TiB (4,096 GiB), you can only expand it up to 4,095 GiB. If you try to expand beyond the limit, the internal PVC shows an error message about disk size or caching type limitations. Stop your VM or detach the disk and retry the operation."
107
169
108
-
To avoid errors, don't attempt to expand your current storage pool beyond 4,095 GiB if it is initially smaller than 4 TiB (4,096 GiB). Storage pools larger than 4 TiB can be expanded up to the maximum storage capacity available.
170
+
To avoid errors, don't attempt to expand your current storage pool beyond 4,095 GiB if it's initially smaller than 4 TiB (4,096 GiB). Storage pools larger than 4 TiB can be expanded up to the maximum storage capacity available.
109
171
110
172
This limitation only applies when using `Premium_LRS`, `Standard_LRS`, `StandardSSD_LRS`, `Premium_ZRS`, and `StandardSSD_ZRS` Disk SKUs.
111
173
@@ -121,21 +183,21 @@ To remediate, create a node pool with a VM SKU that has NVMe drives and try agai
121
183
122
184
### Storage pool type already enabled
123
185
124
-
If you try to enable a storage pool type that's already enabled, you get the following message: *Invalid `--enable-azure-container-storage` value. Azure Container Storage is already enabled for storage pool type `<storage-pool-type>` in the cluster*. You can check if you have any existing storage pools created by running `kubectl get sp -n acstor`.
186
+
If you try to enable a storage pool type that exists, you get the following message: *Invalid `--enable-azure-container-storage` value. Azure Container Storage is already enabled for storage pool type `<storage-pool-type>` in the cluster*. You can check if you have any existing storage pools created by running `kubectl get sp -n acstor`.
125
187
126
188
### Disabling a storage pool type
127
189
128
190
When disabling a storage pool type via `az aks update --disable-azure-container-storage <storage-pool-type>` or uninstalling Azure Container Storage via `az aks update --disable-azure-container-storage all`, if there's an existing storage pool of that type, you get the following message:
129
191
130
-
*Disabling Azure Container Storage for storage pool type `<storage-pool-type>`will forcefully delete all the storage pools of the same type and affect the applications using these storage pools. Forceful deletion of storage pools can also lead to leaking of storage resources which are being consumed. Do you want to validate whether any of the storage pools of type `<storage-pool-type>` are being used before disabling Azure Container Storage? (Y/n)*
192
+
*Disabling Azure Container Storage for storage pool type `<storage-pool-type>` forcefully deletes all the storage pools of the same type and it affects the applications using these storage pools. Forceful deletion of storage pools can also lead to leaking of storage resources which are being consumed. Do you want to validate whether any of the storage pools of type `<storage-pool-type>` are being used before disabling Azure Container Storage? (Y/n)*
131
193
132
194
If you select Y, an automatic validation runs to ensure that there are no persistent volumes created from the storage pool. Selecting n bypasses this validation and disables the storage pool type, deleting any existing storage pools and potentially affecting your application.
133
195
134
196
## Troubleshoot volume issues
135
197
136
-
### Pod pending creation due to ephemeral volume size above available capacity
198
+
### Pod pending creation due to ephemeral volume size beyond available capacity
137
199
138
-
An ephemeral volume is allocated on a single node. When you configure the size of ephemeral volumes for your pods, the size should be less than the available capacity of a single node's ephemeral disk. Otherwise, the pod creation will be in pending status.
200
+
An ephemeral volume is allocated on a single node. When you configure the size of ephemeral volumes for your pods, the size should be less than the available capacity of a single node's ephemeral disk. Otherwise, the pod creation is in pending status.
139
201
140
202
Use the following command to check if your pod creation is in pending status.
In this example, the available capacity of temp disk for a single node is `75031990272` bytes or 69 GiB.
184
246
185
-
Adjust the volume storage size below available capacity and redeploy your pod. See [Deploy a pod with a generic ephemeral volume](use-container-storage-with-temp-ssd.md#3-deploy-a-pod-with-a-generic-ephemeral-volume).
247
+
Adjust the volume storage size less than available capacity and redeploy your pod. See [Deploy a pod with a generic ephemeral volume](use-container-storage-with-temp-ssd.md#3-deploy-a-pod-with-a-generic-ephemeral-volume).
186
248
187
249
### Volume fails to attach due to metadata store offline
188
250
189
-
Azure Container Storage uses `etcd`, a distributed, reliable key-value store, to store and manage metadata of volumes to support volume orchestration operations. For high availability and resiliency, `etcd` runs in three pods. When there are less than two `etcd` instances running, Azure Container Storage will halt volume orchestration operations while still allowing data access to the volumes. Azure Container Storage automatically detects when an `etcd` instance is offline and recovers it. However, if you notice volume orchestration errors after restarting an AKS cluster, it's possible that an `etcd` instance failed to autorecover. Follow the instructions in this section to determine the health status of the `etcd` instances.
251
+
Azure Container Storage uses `etcd`, a distributed, reliable key-value store, to store and manage metadata of volumes to support volume orchestration operations. For high availability and resiliency, `etcd` runs in three pods. When there are less than two `etcd` instances running, Azure Container Storage halts volume orchestration operations while still allowing data access to the volumes. Azure Container Storage automatically detects when an `etcd` instance is offline and recovers it. However, if you notice volume orchestration errors after restarting an AKS cluster, it's possible that an `etcd` instance failed to autorecover. Follow the instructions in this section to determine the health status of the `etcd` instances.
If fewer than two instances are shown in the Running state, you can conclude that the volume is failing to attach due to the metadata store being offline, and the automated recovery wasn't successful. If so, file a support ticket with [Azure Support](https://azure.microsoft.com/support/).
302
+
If fewer than two instances are running, the volume isn't attaching because the metadata store is offline, and automated recovery failed. If so, file a support ticket with [Azure Support](https://azure.microsoft.com/support/).
0 commit comments