Skip to content

Commit 93a4433

Browse files
authored
Update troubleshoot-container-storage.md
Adding TSG "Ephemeral storage pool doesn’t claim the capacity when the ephemeral disks are used by other daemonsets"
1 parent 4ce68c6 commit 93a4433

File tree

1 file changed

+62
-0
lines changed

1 file changed

+62
-0
lines changed

articles/storage/container-storage/troubleshoot-container-storage.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,68 @@ To remediate, create a node pool with a VM SKU that has NVMe drives and try agai
101101

102102
To check the status of your storage pools, run `kubectl describe sp <storage-pool-name> -n acstor`. Here are some issues you might encounter.
103103

104+
### Ephemeral storage pool doesn’t claim the capacity when the ephemeral disks are used by other daemonsets
105+
106+
When you enable ephemeral storage pool on a nodepool where the nodes have temp SSD or local NVMe disks, it’s possible that the ephemeral storage pool doesn’t claim the capacity from these disks because they are used by other daemonsets.
107+
108+
You can follow the guidance below to enable Azure Container Storage to manage these local disks exclusively:
109+
110+
1. Run the following command to see the claimed capacity by ephemeral storage pool:
111+
112+
```bash
113+
$ kubectl get sp -A
114+
NAMESPACE NAME CAPACITY AVAILABLE USED RESERVED READY AGE
115+
acstor ephemeraldisk-nvme 0 0 0 0 False 82s
116+
```
117+
Above example shows 0 capacity claimed by `ephemeraldisk-nvme` storage pool.
118+
119+
1. Run the following command to confirm unclaimed state of these local block devices and check existing file system on the disks:
120+
```bash
121+
$ kubectl get bd -A
122+
NAMESPACE NAME NODENAME SIZE CLAIMSTATE STATUS AGE
123+
acstor blockdevice-1f7ad8fa32a448eb9768ad8e261312ff aks-nodepoolnvme-38618677-vmss000001 1920383410176 Unclaimed Active 22m
124+
acstor blockdevice-9c8096fc47cc2b41a2ed07ec17a83527 aks-nodepoolnvme-38618677-vmss000000 1920383410176 Unclaimed Active 23m
125+
$ kubectl describe bd -n acstor blockdevice-1f7ad8fa32a448eb9768ad8e261312ff
126+
Name: blockdevice-1f7ad8fa32a448eb9768ad8e261312ff
127+
128+
Filesystem:
129+
Fs Type: ext4
130+
131+
```
132+
Above example shows that the block devices are `Unclaimed` status and there is an existing file system on the disk.
133+
134+
1. Confirm that you want to use Azure Container Storage to manage the local data disks exclusively before proceeding.
135+
136+
1. Stop and remove the daemonsets or components that manage local data disks.
137+
138+
1. Login to each node that has local data disks.
139+
140+
1. Remove existing file systems from all local data disks.
141+
142+
1. Restart ndm daemonset to discover unused local data disks.
143+
```bash
144+
$ kubectl rollout restart daemonset -l app=ndm -n acstor
145+
daemonset.apps/azurecontainerstorage-ndm restarted
146+
$ kubectl rollout status daemonset -l app=ndm -n acstor --watch
147+
148+
daemon set "azurecontainerstorage-ndm" successfully rolled out
149+
```
150+
151+
1. Wait a few seconds and check if the capacity from local data disks is claimed by ephemeral storage pool.
152+
153+
```bash
154+
$ kubectl wait -n acstor sp --all --for condition=ready
155+
storagepool.containerstorage.azure.com/ephemeraldisk-nvme condition met
156+
$ kubectl get bd -A
157+
NAMESPACE NAME NODENAME SIZE CLAIMSTATE STATUS AGE
158+
acstor blockdevice-1f7ad8fa32a448eb9768ad8e261312ff aks-nodepoolnvme-38618677-vmss000001 1920383410176 Claimed Active 4d16h
159+
acstor blockdevice-9c8096fc47cc2b41a2ed07ec17a83527 aks-nodepoolnvme-38618677-vmss000000 1920383410176 Claimed Active 4d16h
160+
$ kubectl get sp -A
161+
NAMESPACE NAME CAPACITY AVAILABLE USED RESERVED READY AGE
162+
acstor ephemeraldisk-nvme 3840766820352 3812058578944 28708241408 26832871424 True 4d16h
163+
```
164+
Above example shows `ephemeraldisk-nvme` storage pool successfully claims the capacity from local NVMe disks on the nodes.
165+
104166
### Error when trying to expand an Azure Disks storage pool
105167

106168
If your existing storage pool is less than 4 TiB (4,096 GiB), you can only expand it up to 4,095 GiB. If you try to expand beyond that, the internal PVC will get an error message like "Only Disk CachingType 'None' is supported for disk with size greater than 4095 GB" or "Disk 'xxx' of size 4096 GB (<=4096 GB) cannot be resized to 16384 GB (>4096 GB) while it is attached to a running VM. Please stop your VM or detach the disk and retry the operation."

0 commit comments

Comments
 (0)