Skip to content

Commit afd0cdc

Browse files
committed
docs: also test storage capacity tracking with 20 pods/node
This is a more "normal" load for a node.
1 parent f053a7b commit afd0cdc

File tree

1 file changed

+97
-5
lines changed

1 file changed

+97
-5
lines changed

docs/storage-capacity-tracking.md

Lines changed: 97 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,8 @@ aks-nodepool1-15818640-vmss000009 721m 37% 1890Mi 41%
5656
```
5757

5858
Test results were:
59-
```
59+
60+
```xml
6061
<?xml version="1.0" encoding="UTF-8"?>
6162
<testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="446.487">
6263
<testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="446.483938942"></testcase>
@@ -130,7 +131,7 @@ The number of namespaces and pods is the same, but now they have to be
130131
distributed among all nodes because each node has storage for exactly 100
131132
volumes (`--capacity=fast=100Gi`).
132133

133-
```
134+
```xml
134135
<?xml version="1.0" encoding="UTF-8"?>
135136
<testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="806.468">
136137
<testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="806.464585136"></testcase>
@@ -217,7 +218,7 @@ index ce9abc40..88983120 100644
217218
Starting pods was more than twice as fast as without storage capacity tracking
218219
(193 seconds instead of 414 seconds):
219220

220-
```
221+
```xml
221222
<?xml version="1.0" encoding="UTF-8"?>
222223
<testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="544.772">
223224
<testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="544.769501842"></testcase>
@@ -253,7 +254,7 @@ number of pods gets scaled up to 10000 automatically.
253254

254255
The baseline without volumes turned out to be this:
255256

256-
```
257+
```xml
257258
<?xml version="1.0" encoding="UTF-8"?>
258259
<testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="3208.062">
259260
<testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="3208.059435154"></testcase>
@@ -296,7 +297,7 @@ There were other intermittent problems accessing the apiserver. Doing the
296297
limits](https://github.com/kubernetes-csi/external-provisioner/pull/711) solved
297298
this problem and the same test passed all three times that it was run:
298299

299-
```
300+
```xml
300301
<?xml version="1.0" encoding="UTF-8"?>
301302
<testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="3989.135">
302303
<testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="3989.131860537"></testcase>
@@ -310,3 +311,94 @@ In this run there were 573 failed provisioning attempts.
310311

311312
The ratio between "with volumes" and "no volumes" is 1.58. That is even better
312313
than for 10 nodes where that ratio was 1.68.
314+
315+
## 20 pods per node
316+
317+
Creating 100 pods per node was meant to stress the kube-apiserver. 100 pods per
318+
node is near a limit in kubelet of 110 pods per node. To ensure that storage
319+
capacity really is the limiting factor, the test was repeated with 5Gi per
320+
volume. Then 20 pods per node are needed to exhaust storage capacity
321+
completely. Only 10 nodes were tested.
322+
323+
The baseline became:
324+
325+
```xml
326+
<?xml version="1.0" encoding="UTF-8"?>
327+
<testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="94.942">
328+
<testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="94.93896563"></testcase>
329+
<testcase name="storage: [step: 01] Starting measurement for waiting for deployments" classname="ClusterLoaderV2" time="0.100413253"></testcase>
330+
<testcase name="storage: [step: 02] Creating deployments" classname="ClusterLoaderV2" time="20.124369388"></testcase>
331+
<testcase name="storage: [step: 03] Waiting for deployments to be running" classname="ClusterLoaderV2" time="29.383043774"></testcase>
332+
<testcase name="storage: [step: 04] Deleting deployments" classname="ClusterLoaderV2" time="20.124994532"></testcase>
333+
```
334+
335+
Without storage capacity, scheduling only got 193 pods running and then got
336+
stuck, with unsuccessful retries for the remaining 7 volumes:
337+
338+
```console
339+
$ kubectl get pvc --all-namespaces | grep -v Bound
340+
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
341+
test-1a4kwd-1 deployment-176-bb8cc865b-9gz7b-vol-0 Pending csi-hostpath-fast 8m30s
342+
test-1a4kwd-1 deployment-185-78879f766b-h5m6x-vol-0 Pending csi-hostpath-fast 8m28s
343+
test-1a4kwd-1 deployment-191-6888db84cf-k9cpd-vol-0 Pending csi-hostpath-fast 8m26s
344+
test-1a4kwd-1 deployment-192-78f546fcf8-v44f4-vol-0 Pending csi-hostpath-fast 8m26s
345+
test-1a4kwd-1 deployment-193-56f9d79877-qcfms-vol-0 Pending csi-hostpath-fast 8m26s
346+
test-1a4kwd-1 deployment-196-8cdb49946-9clb6-vol-0 Pending csi-hostpath-fast 8m25s
347+
test-1a4kwd-1 deployment-198-5f9657f9d8-whxtc-vol-0 Pending csi-hostpath-fast 8m25s
348+
test-1a4kwd-1 deployment-199-75445c6c6c-qzmvz-vol-0 Pending csi-hostpath-fast 8m25s
349+
350+
$ kubectl describe -n test-1a4kwd-1 pvc/deployment-176-bb8cc865b-9gz7b-vol-0
351+
Name: deployment-176-bb8cc865b-9gz7b-vol-0
352+
Namespace: test-1a4kwd-1
353+
StorageClass: csi-hostpath-fast
354+
Status: Pending
355+
Volume:
356+
Labels: app=deployment-176
357+
group=volume-test
358+
Annotations: volume.beta.kubernetes.io/storage-provisioner: hostpath.csi.k8s.io
359+
Finalizers: [kubernetes.io/pvc-protection]
360+
Capacity:
361+
Access Modes:
362+
VolumeMode: Filesystem
363+
Used By: <none>
364+
Events:
365+
Type Reason Age From Message
366+
---- ------ ---- ---- -------
367+
Normal WaitForPodScheduled 7m8s (x16 over 8m39s) persistentvolume-controller waiting for pod deployment-176-bb8cc865b-9gz7b to be scheduled
368+
Normal Provisioning 6m13s (x2 over 8m24s) hostpath.csi.k8s.io_csi-hostpathplugin-4lh5p_1d18e99a-ce86-4517-9619-a109ea7c33d9 External provisioner is provisioning volume for claim "test-1a4kwd-1/deployment-176-bb8cc865b-9gz7b-vol-0"
369+
Warning ProvisioningFailed 6m13s (x2 over 8m24s) hostpath.csi.k8s.io_csi-hostpathplugin-4lh5p_1d18e99a-ce86-4517-9619-a109ea7c33d9 failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 5368709120 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
370+
Normal Provisioning 5m38s (x5 over 8m44s) hostpath.csi.k8s.io_csi-hostpathplugin-s5vhk_dc0ecff1-8f7c-47a2-9cf6-add7b6216d74 External provisioner is provisioning volume for claim "test-1a4kwd-1/deployment-176-bb8cc865b-9gz7b-vol-0"
371+
Warning ProvisioningFailed 5m38s (x5 over 8m44s) hostpath.csi.k8s.io_csi-hostpathplugin-s5vhk_dc0ecff1-8f7c-47a2-9cf6-add7b6216d74 failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 5368709120 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
372+
Normal ExternalProvisioning 3m42s (x27 over 8m44s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "hostpath.csi.k8s.io" or manually created by system administrator
373+
Normal Provisioning 115s (x6 over 6m48s) hostpath.csi.k8s.io_csi-hostpathplugin-2zmtd_7dff6f7c-1d47-4269-8db3-a5c3ec74b39a External provisioner is provisioning volume for claim "test-1a4kwd-1/deployment-176-bb8cc865b-9gz7b-vol-0"
374+
Warning ProvisioningFailed 115s (x6 over 6m48s) hostpath.csi.k8s.io_csi-hostpathplugin-2zmtd_7dff6f7c-1d47-4269-8db3-a5c3ec74b39a failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 5368709120 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
375+
Normal Provisioning 67s (x5 over 7m) hostpath.csi.k8s.io_csi-hostpathplugin-8mqk7_c982557d-0a17-406a-aa06-ca3e97715ad8 External provisioner is provisioning volume for claim "test-1a4kwd-1/deployment-176-bb8cc865b-9gz7b-vol-0"
376+
Warning ProvisioningFailed 67s (x5 over 7m) hostpath.csi.k8s.io_csi-hostpathplugin-8mqk7_c982557d-0a17-406a-aa06-ca3e97715ad8 failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 5368709120 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
377+
Normal Provisioning 55s (x4 over 4m5s) hostpath.csi.k8s.io_csi-hostpathplugin-5dhxm_18b26463-08bd-4b35-9d4e-3b4565381432 External provisioner is provisioning volume for claim "test-1a4kwd-1/deployment-176-bb8cc865b-9gz7b-vol-0"
378+
Warning ProvisioningFailed 55s (x4 over 4m5s) hostpath.csi.k8s.io_csi-hostpathplugin-5dhxm_18b26463-08bd-4b35-9d4e-3b4565381432 failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 5368709120 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
379+
Warning ProvisioningFailed 44s (x5 over 8m34s) hostpath.csi.k8s.io_csi-hostpathplugin-kg52t_75dcbf6c-afa5-4e84-9dfd-b10c20f51e3a failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 5368709120 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
380+
Normal Provisioning 44s (x5 over 8m34s) hostpath.csi.k8s.io_csi-hostpathplugin-kg52t_75dcbf6c-afa5-4e84-9dfd-b10c20f51e3a External provisioner is provisioning volume for claim "test-1a4kwd-1/deployment-176-bb8cc865b-9gz7b-vol-0"
381+
Warning ProvisioningFailed 32s (x3 over 7m48s) hostpath.csi.k8s.io_csi-hostpathplugin-4gg76_f9f1f670-6765-4722-9265-7d9971501874 failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 5368709120 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
382+
Normal Provisioning 32s (x3 over 7m48s) hostpath.csi.k8s.io_csi-hostpathplugin-4gg76_f9f1f670-6765-4722-9265-7d9971501874 External provisioner is provisioning volume for claim "test-1a4kwd-1/deployment-176-bb8cc865b-9gz7b-vol-0"
383+
Warning ProvisioningFailed 20s (x10 over 8m40s) hostpath.csi.k8s.io_csi-hostpathplugin-p8h4z_ba4c2251-f7e9-4497-bb10-c8597fbe1d32 failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 5368709120 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
384+
Normal Provisioning 20s (x10 over 8m40s) hostpath.csi.k8s.io_csi-hostpathplugin-p8h4z_ba4c2251-f7e9-4497-bb10-c8597fbe1d32 External provisioner is provisioning volume for claim "test-1a4kwd-1/deployment-176-bb8cc865b-9gz7b-vol-0"
385+
Normal Provisioning 8s (x6 over 6m36s) hostpath.csi.k8s.io_csi-hostpathplugin-h9hrq_51e8adfd-bbf4-474e-ae6e-4f7d7f65809c External provisioner is provisioning volume for claim "test-1a4kwd-1/deployment-176-bb8cc865b-9gz7b-vol-0"
386+
Warning ProvisioningFailed 8s (x6 over 6m36s) hostpath.csi.k8s.io_csi-hostpathplugin-h9hrq_51e8adfd-bbf4-474e-ae6e-4f7d7f65809c failed to provision volume with StorageClass "csi-hostpath-fast": rpc error: code = ResourceExhausted desc = requested capacity 5368709120 exceeds remaining capacity for "fast", 100Gi out of 100Gi already used
387+
388+
$ for i in `kubectl get pods | grep csi-hostpathplugin- | sed -e 's/ .*//'`; do kubectl logs $i hostpath ; done | grep '^E.*code = ResourceExhausted desc = requested capacity .*exceeds remaining capacity for "fast"' | wc -l
389+
1822
390+
```
391+
392+
With storage capacity tracking, the test passed with a similar slowdown of 1.69
393+
compared to the baseline:
394+
395+
```xml
396+
<?xml version="1.0" encoding="UTF-8"?>
397+
<testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="125.341">
398+
<testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="125.338287417"></testcase>
399+
<testcase name="storage: [step: 01] Starting measurement for waiting for deployments" classname="ClusterLoaderV2" time="0.100360119"></testcase>
400+
<testcase name="storage: [step: 02] Creating deployments" classname="ClusterLoaderV2" time="20.11793562"></testcase>
401+
<testcase name="storage: [step: 03] Waiting for deployments to be running" classname="ClusterLoaderV2" time="49.791449632"></testcase>
402+
<testcase name="storage: [step: 04] Deleting deployments" classname="ClusterLoaderV2" time="20.12373259"></testcase>
403+
</testsuite>
404+
```

0 commit comments

Comments
 (0)