|
| 1 | +--- |
| 2 | +title: 存储容量 |
| 3 | +content_type: concept |
| 4 | +weight: 45 |
| 5 | +--- |
| 6 | + |
| 7 | +<!-- overview --> |
| 8 | +<!-- |
| 9 | +Storage capacity is limited and may vary depending on the node on |
| 10 | +which a pod runs: network-attached storage might not be accessible by |
| 11 | +all nodes, or storage is local to a node to begin with. |
| 12 | +
|
| 13 | +{{< feature-state for_k8s_version="v1.19" state="alpha" >}} |
| 14 | +
|
| 15 | +This page describes how Kubernetes keeps track of storage capacity and |
| 16 | +how the scheduler uses that information to schedule Pods onto nodes |
| 17 | +that have access to enough storage capacity for the remaining missing |
| 18 | +volumes. Without storage capacity tracking, the scheduler may choose a |
| 19 | +node that doesn't have enough capacity to provision a volume and |
| 20 | +multiple scheduling retries will be needed. |
| 21 | +
|
| 22 | +Tracking storage capacity is supported for {{< glossary_tooltip |
| 23 | +text="Container Storage Interface" term_id="csi" >}} (CSI) drivers and |
| 24 | +[needs to be enabled](#enabling-storage-capacity-tracking) when installing a CSI driver. |
| 25 | +--> |
| 26 | +存储容量是有限的,并且会因为运行 pod 的节点不同而变化:网络存储可能并非所有节点都能够访问,或者对于某个节点存储是本地的。 |
| 27 | + |
| 28 | +{{< feature-state for_k8s_version="v1.19" state="alpha" >}} |
| 29 | + |
| 30 | +本页面描述了 Kubernetes 如何跟踪存储容量以及调度程序如何为了余下的尚未挂载的卷使用该信息将 Pod 调度到能够访问到足够存储容量的节点上。 |
| 31 | +如果没有跟踪存储容量,调度程序可能会选择一个没有足够容量来提供卷的节点,并且需要多次调度重试。 |
| 32 | + |
| 33 | +{{< glossary_tooltip text="容器存储接口" term_id="csi" >}}(CSI)驱动程序支持跟踪存储容量, |
| 34 | +并且在安装 CSI 驱动程序时[需要启用](#enabling-storage-capacity-tracking)该功能。 |
| 35 | + |
| 36 | +<!-- body --> |
| 37 | +<!-- |
| 38 | +## API |
| 39 | +
|
| 40 | +There are two API extensions for this feature: |
| 41 | +- [CSIStorageCapacity](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csistoragecapacity-v1alpha1-storage-k8s-io) objects: |
| 42 | + these get produced by a CSI driver in the namespace |
| 43 | + where the driver is installed. Each object contains capacity |
| 44 | + information for one storage class and defines which nodes have |
| 45 | + access to that storage. |
| 46 | +- [The `CSIDriverSpec.StorageCapacity` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csidriverspec-v1-storage-k8s-io): |
| 47 | + when set to `true`, the Kubernetes scheduler will consider storage |
| 48 | + capacity for volumes that use the CSI driver. |
| 49 | +--> |
| 50 | +## API |
| 51 | + |
| 52 | +这个特性有两个 API 扩展接口: |
| 53 | +- [CSIStorageCapacity](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csistoragecapacity-v1alpha1-storage-k8s-io) |
| 54 | +对象:这些对象由 CSI 驱动程序在安装驱动程序的命名空间中产生。每个对象都包含一个存储类的容量信息,并定义哪些节点可以访问该存储。 |
| 55 | +- [`CSIDriverSpec.StorageCapacity` 字段](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csidriverspec-v1-storage-k8s-io): |
| 56 | +设置为 true 时,Kubernetes 调度程序将考虑使用 CSI 驱动程序的卷的存储容量。 |
| 57 | + |
| 58 | +<!-- |
| 59 | +## Scheduling |
| 60 | +
|
| 61 | +Storage capacity information is used by the Kubernetes scheduler if: |
| 62 | +- the `CSIStorageCapacity` feature gate is true, |
| 63 | +- a Pod uses a volume that has not been created yet, |
| 64 | +- that volume uses a {{< glossary_tooltip text="StorageClass" term_id="storage-class" >}} which references a CSI driver and |
| 65 | + uses `WaitForFirstConsumer` [volume binding |
| 66 | + mode](/docs/concepts/storage/storage-classes/#volume-binding-mode), |
| 67 | + and |
| 68 | +- the `CSIDriver` object for the driver has `StorageCapacity` set to |
| 69 | + true. |
| 70 | +
|
| 71 | +In that case, the scheduler only considers nodes for the Pod which |
| 72 | +have enough storage available to them. This check is very |
| 73 | +simplistic and only compares the size of the volume against the |
| 74 | +capacity listed in `CSIStorageCapacity` objects with a topology that |
| 75 | +includes the node. |
| 76 | +
|
| 77 | +For volumes with `Immediate` volume binding mode, the storage driver |
| 78 | +decides where to create the volume, independently of Pods that will |
| 79 | +use the volume. The scheduler then schedules Pods onto nodes where the |
| 80 | +volume is available after the volume has been created. |
| 81 | +
|
| 82 | +For [CSI ephemeral volumes](/docs/concepts/storage/volumes/#csi), |
| 83 | +scheduling always happens without considering storage capacity. This |
| 84 | +is based on the assumption that this volume type is only used by |
| 85 | +special CSI drivers which are local to a node and do not need |
| 86 | +significant resources there. |
| 87 | +--> |
| 88 | +## 调度 |
| 89 | + |
| 90 | +如果有以下情况,存储容量信息将会被 Kubernetes 调度程序使用: |
| 91 | +- `CSIStorageCapacity` 特性门控被设置为 true, |
| 92 | +- Pod 使用的卷还没有被创建, |
| 93 | +- 卷使用引用了 CSI 驱动的 {{< glossary_tooltip text="StorageClass" term_id="storage-class" >}}, |
| 94 | +并且使用了 `WaitForFirstConsumer` [卷绑定模式](/zh/docs/concepts/storage/storage-classes/#volume-binding-mode), |
| 95 | +- 驱动程序的 `CSIDriver` 对象的 `StorageCapacity` 被设置为 true。 |
| 96 | + |
| 97 | +在这种情况下,调度程序仅考虑将 Pod 调度到有足够存储容量的节点上。这个检测非常简单, |
| 98 | +仅将卷的大小与 `CSIStorageCapacity` 对象中列出的容量进行比较,并使用包含该节点的拓扑。 |
| 99 | + |
| 100 | +对于具有 `Immediate` 卷绑定模式的卷,存储驱动程序将决定在何处创建该卷,而不取决于将使用该卷的 Pod。 |
| 101 | +然后,调度程序将 Pod 调度到创建卷后可使用该卷的节点上。 |
| 102 | + |
| 103 | +对于 [CSI 临时卷](/zh/docs/concepts/storage/volumes/#csi),调度总是在不考虑存储容量的情况下进行。 |
| 104 | +这是基于这样的假设:该卷类型仅由节点本地的特殊 CSI 驱动程序使用,并且不需要大量资源。 |
| 105 | + |
| 106 | +<!-- |
| 107 | +## Rescheduling |
| 108 | +
|
| 109 | +When a node has been selected for a Pod with `WaitForFirstConsumer` |
| 110 | +volumes, that decision is still tentative. The next step is that the |
| 111 | +CSI storage driver gets asked to create the volume with a hint that the |
| 112 | +volume is supposed to be available on the selected node. |
| 113 | +
|
| 114 | +Because Kubernetes might have chosen a node based on out-dated |
| 115 | +capacity information, it is possible that the volume cannot really be |
| 116 | +created. The node selection is then reset and the Kubernetes scheduler |
| 117 | +tries again to find a node for the Pod. |
| 118 | +--> |
| 119 | +## 重新调度 |
| 120 | + |
| 121 | +当为带有 `WaitForFirstConsumer` 的卷的 Pod 来选择节点时,该决定仍然是暂定的。 |
| 122 | +下一步是要求 CSI 存储驱动程序创建卷,并提示该卷在被选择的节点上可用。 |
| 123 | + |
| 124 | +因为 Kubernetes 可能会根据已经过时的存储容量信息来选择一个节点,因此可能无法真正创建卷。 |
| 125 | +然后就会重置节点选择,Kubernetes 调度器会再次尝试为 Pod 查找节点。 |
| 126 | + |
| 127 | +<!-- |
| 128 | +## Limitations |
| 129 | +
|
| 130 | +Storage capacity tracking increases the chance that scheduling works |
| 131 | +on the first try, but cannot guarantee this because the scheduler has |
| 132 | +to decide based on potentially out-dated information. Usually, the |
| 133 | +same retry mechanism as for scheduling without any storage capacity |
| 134 | +information handles scheduling failures. |
| 135 | +
|
| 136 | +One situation where scheduling can fail permanently is when a Pod uses |
| 137 | +multiple volumes: one volume might have been created already in a |
| 138 | +topology segment which then does not have enough capacity left for |
| 139 | +another volume. Manual intervention is necessary to recover from this, |
| 140 | +for example by increasing capacity or deleting the volume that was |
| 141 | +already created. [Further |
| 142 | +work](https://github.com/kubernetes/enhancements/pull/1703) is needed |
| 143 | +to handle this automatically. |
| 144 | +--> |
| 145 | +## 限制 |
| 146 | + |
| 147 | +存储容量跟踪增加了调度器第一次尝试即成功的机会,但是并不能保证这一点,因为调度器必须根据可能过期的信息来进行决策。 |
| 148 | +通常,与没有任何存储容量信息的调度相同的重试机制可以处理调度失败。 |
| 149 | + |
| 150 | +当 Pod 使用多个卷时,调度可能会永久失败:一个卷可能已经在拓扑段中创建,而该卷又没有足够的容量来创建另一个卷, |
| 151 | +要想从中恢复,必须要进行手动干预,比如通过增加存储容量或者删除已经创建的卷。 |
| 152 | +需要[进一步工作](https://github.com/kubernetes/enhancements/pull/1703)来自动处理此问题。 |
| 153 | + |
| 154 | +<!-- |
| 155 | +## Enabling storage capacity tracking |
| 156 | +
|
| 157 | +Storage capacity tracking is an *alpha feature* and only enabled when |
| 158 | +the `CSIStorageCapacity` [feature |
| 159 | +gate](/docs/reference/command-line-tools-reference/feature-gates/) and |
| 160 | +the `storage.k8s.io/v1alpha1` {{< glossary_tooltip text="API group" term_id="api-group" >}} are enabled. For details on |
| 161 | +that, see the `--feature-gates` and `--runtime-config` [kube-apiserver |
| 162 | +parameters](/docs/reference/command-line-tools-reference/kube-apiserver/). |
| 163 | +--> |
| 164 | +## 开启存储容量跟踪 |
| 165 | + |
| 166 | +存储容量跟踪是一个 *alpha 特性*,只有当 `CSIStorageCapacity` [特性门控](/docs/reference/command-line-tools-reference/feature-gates/) |
| 167 | +和 `storage.k8s.io/v1alpha1` {{< glossary_tooltip text="API 组" term_id="api-group" >}}启用时才能启用。 |
| 168 | +更多详细信息,可以查看`--feature-gates` 和 `--runtime-config` [kube-apiserver 参数](/docs/reference/command-line-tools-reference/kube-apiserver/)。 |
| 169 | + |
| 170 | +<!-- |
| 171 | +A quick check |
| 172 | +whether a Kubernetes cluster supports the feature is to list |
| 173 | +CSIStorageCapacity objects with: |
| 174 | +```shell |
| 175 | +kubectl get csistoragecapacities --all-namespaces |
| 176 | +``` |
| 177 | +
|
| 178 | +If your cluster supports CSIStorageCapacity, the response is either a list of CSIStorageCapacity objects or: |
| 179 | +``` |
| 180 | +No resources found |
| 181 | +``` |
| 182 | +--> |
| 183 | +快速检查 Kubernetes 集群是否支持这个特性,可以通过下面命令列出 CSIStorageCapacity 对象: |
| 184 | + |
| 185 | +```shell |
| 186 | +kubectl get csistoragecapacities --all-namespaces |
| 187 | +``` |
| 188 | + |
| 189 | +如果集群支持 CSIStorageCapacity,就会返回 CSIStorageCapacity 的对象列表或者: |
| 190 | +``` |
| 191 | +No resources found |
| 192 | +``` |
| 193 | + |
| 194 | +<!-- |
| 195 | +If not supported, this error is printed instead: |
| 196 | +``` |
| 197 | +error: the server doesn't have a resource type "csistoragecapacities" |
| 198 | +``` |
| 199 | +
|
| 200 | +In addition to enabling the feature in the cluster, a CSI |
| 201 | +driver also has to |
| 202 | +support it. Please refer to the driver's documentation for |
| 203 | +details. |
| 204 | +--> |
| 205 | +如果不支持,下面这个错误就会被打印出来: |
| 206 | +``` |
| 207 | +error: the server doesn't have a resource type "csistoragecapacities" |
| 208 | +``` |
| 209 | + |
| 210 | +除了在集群中启用该功能外,CSI 驱动程序还必须支持它。有关详细信息,请参阅驱动程序的文档。 |
| 211 | + |
| 212 | + |
| 213 | +## {{% heading "whatsnext" %}} |
| 214 | + |
| 215 | +<!-- |
| 216 | + - For more information on the design, see the |
| 217 | +[Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md). |
| 218 | +- For more information on further development of this feature, see the [enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472). |
| 219 | +- Learn about [Kubernetes Scheduler](/docs/concepts/scheduling-eviction/kube-scheduler/) |
| 220 | +--> |
| 221 | +- 想要获得更多该设计的信息,查看 [Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md)。 |
| 222 | +- 有关此功能的进一步开发信息,查看[enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472)。 |
| 223 | +- 学习 [Kubernetes 调度器](/zh/docs/concepts/scheduling-eviction/kube-scheduler/)。 |
0 commit comments