|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Kubernetes v1.33:可变的 CSI 节点可分配数" |
| 4 | +date: 2025-05-02T10:30:00-08:00 |
| 5 | +slug: kubernetes-1-33-mutable-csi-node-allocatable-count |
| 6 | +author: Eddie Torres (Amazon Web Services) |
| 7 | +translator: Michael Yao (DaoCloud) |
| 8 | +--- |
| 9 | +<!-- |
| 10 | +layout: blog |
| 11 | +title: "Kubernetes v1.33: Mutable CSI Node Allocatable Count" |
| 12 | +date: 2025-05-02T10:30:00-08:00 |
| 13 | +slug: kubernetes-1-33-mutable-csi-node-allocatable-count |
| 14 | +author: Eddie Torres (Amazon Web Services) |
| 15 | +--> |
| 16 | + |
| 17 | +<!-- |
| 18 | +Scheduling stateful applications reliably depends heavily on accurate information about resource availability on nodes. |
| 19 | +Kubernetes v1.33 introduces an alpha feature called *mutable CSI node allocatable count*, allowing Container Storage Interface (CSI) drivers to dynamically update the reported maximum number of volumes that a node can handle. |
| 20 | +This capability significantly enhances the accuracy of pod scheduling decisions and reduces scheduling failures caused by outdated volume capacity information. |
| 21 | +--> |
| 22 | +可靠调度有状态应用极度依赖于节点上资源可用性的准确信息。 |
| 23 | +Kubernetes v1.33 引入一个名为**可变的 CSI 节点可分配计数**的 Alpha 特性,允许 |
| 24 | +CSI(容器存储接口)驱动动态更新节点可以处理的最大卷数量。 |
| 25 | +这一能力显著提升 Pod 调度决策的准确性,并减少因卷容量信息过时而导致的调度失败。 |
| 26 | + |
| 27 | +<!-- |
| 28 | +## Background |
| 29 | +
|
| 30 | +Traditionally, Kubernetes CSI drivers report a static maximum volume attachment limit when initializing. However, actual attachment capacities can change during a node's lifecycle for various reasons, such as: |
| 31 | +
|
| 32 | +- Manual or external operations attaching/detaching volumes outside of Kubernetes control. |
| 33 | +- Dynamically attached network interfaces or specialized hardware (GPUs, NICs, etc.) consuming available slots. |
| 34 | +- Multi-driver scenarios, where one CSI driver’s operations affect available capacity reported by another. |
| 35 | +--> |
| 36 | + |
| 37 | +## 背景 {#background} |
| 38 | + |
| 39 | +传统上,Kubernetes 中的 CSI 驱动在初始化时会报告一个静态的最大卷挂接限制。 |
| 40 | +然而,在节点生命周期内,实际的挂接容量可能会由于多种原因发生变化,例如: |
| 41 | + |
| 42 | +- 在 Kubernetes 控制之外的手动或外部操作挂接/解除挂接卷。 |
| 43 | +- 动态挂接的网络接口或专用硬件(如 GPU、NIC 等)占用可用的插槽。 |
| 44 | +- 在多驱动场景中,一个 CSI 驱动的操作会影响另一个驱动所报告的可用容量。 |
| 45 | + |
| 46 | +<!-- |
| 47 | +Static reporting can cause Kubernetes to schedule pods onto nodes that appear to have capacity but don't, leading to pods stuck in a `ContainerCreating` state. |
| 48 | +
|
| 49 | +## Dynamically adapting CSI volume limits |
| 50 | +
|
| 51 | +With the new feature gate `MutableCSINodeAllocatableCount`, Kubernetes enables CSI drivers to dynamically adjust and report node attachment capacities at runtime. This ensures that the scheduler has the most accurate, up-to-date view of node capacity. |
| 52 | +--> |
| 53 | +静态报告可能导致 Kubernetes 将 Pod 调度到看似有容量但实际没有的节点上,进而造成 |
| 54 | +Pod 长时间卡在 `ContainerCreating` 状态。 |
| 55 | + |
| 56 | +## 动态适应 CSI 卷限制 {#dynamically-adapting-csi-volume-limits} |
| 57 | + |
| 58 | +借助新的特性门控 `MutableCSINodeAllocatableCount`,Kubernetes 允许 CSI |
| 59 | +驱动在运行时动态调整并报告节点的挂接容量。如此确保调度器能获取到最准确、最新的节点容量信息。 |
| 60 | + |
| 61 | +<!-- |
| 62 | +### How it works |
| 63 | +
|
| 64 | +When this feature is enabled, Kubernetes supports two mechanisms for updating the reported node volume limits: |
| 65 | +
|
| 66 | +- **Periodic Updates:** CSI drivers specify an interval to periodically refresh the node's allocatable capacity. |
| 67 | +- **Reactive Updates:** An immediate update triggered when a volume attachment fails due to exhausted resources (`ResourceExhausted` error). |
| 68 | +--> |
| 69 | +### 工作原理 {#how-it-works} |
| 70 | + |
| 71 | +启用此特性后,Kubernetes 支持通过以下两种机制来更新节点卷限制的报告值: |
| 72 | + |
| 73 | +- **周期性更新:** CSI 驱动指定一个间隔时间,来定期刷新节点的可分配容量。 |
| 74 | +- **响应式更新:** 当因资源耗尽(`ResourceExhausted` 错误)导致卷挂接失败时,立即触发更新。 |
| 75 | + |
| 76 | +<!-- |
| 77 | +### Enabling the feature |
| 78 | +
|
| 79 | +To use this alpha feature, you must enable the `MutableCSINodeAllocatableCount` feature gate in these components: |
| 80 | +--> |
| 81 | +### 启用此特性 {#enabling-the-feature} |
| 82 | + |
| 83 | +要使用此 Alpha 特性,你必须在以下组件中启用 `MutableCSINodeAllocatableCount` 特性门控: |
| 84 | + |
| 85 | +- `kube-apiserver` |
| 86 | +- `kubelet` |
| 87 | + |
| 88 | +<!-- |
| 89 | +### Example CSI driver configuration |
| 90 | +
|
| 91 | +Below is an example of configuring a CSI driver to enable periodic updates every 60 seconds: |
| 92 | +--> |
| 93 | +### CSI 驱动配置示例 {#example-csi-driver-configuration} |
| 94 | + |
| 95 | +以下是配置 CSI 驱动以每 60 秒进行一次周期性更新的示例: |
| 96 | + |
| 97 | +```yaml |
| 98 | +apiVersion: storage.k8s.io/v1 |
| 99 | +kind: CSIDriver |
| 100 | +metadata: |
| 101 | + name: example.csi.k8s.io |
| 102 | +spec: |
| 103 | + nodeAllocatableUpdatePeriodSeconds: 60 |
| 104 | +``` |
| 105 | +
|
| 106 | +<!-- |
| 107 | +This configuration directs Kubelet to periodically call the CSI driver's `NodeGetInfo` method every 60 seconds, updating the node’s allocatable volume count. Kubernetes enforces a minimum update interval of 10 seconds to balance accuracy and resource usage. |
| 108 | +--> |
| 109 | +此配置会指示 Kubelet 每 60 秒调用一次 CSI 驱动的 `NodeGetInfo` 方法,从而更新节点的可分配卷数量。 |
| 110 | +Kubernetes 强制要求最小更新间隔时间为 10 秒,以平衡准确性和资源使用量。 |
| 111 | + |
| 112 | +<!-- |
| 113 | +### Immediate updates on attachment failures |
| 114 | + |
| 115 | +In addition to periodic updates, Kubernetes now reacts to attachment failures. Specifically, if a volume attachment fails with a `ResourceExhausted` error (gRPC code `8`), an immediate update is triggered to correct the allocatable count promptly. |
| 116 | + |
| 117 | +This proactive correction prevents repeated scheduling errors and helps maintain cluster health. |
| 118 | +--> |
| 119 | +### 挂接失败时的即时更新 {#immediate-updates-on-attachment-failures} |
| 120 | + |
| 121 | +除了周期性更新外,Kubernetes 现在也能对挂接失败做出响应。 |
| 122 | +具体来说,如果卷挂接由于 `ResourceExhausted` 错误(gRPC 错误码 `8`)而失败,将立即触发更新,以快速纠正可分配数量。 |
| 123 | + |
| 124 | +这种主动纠正可以防止重复的调度错误,有助于保持集群的健康状态。 |
| 125 | + |
| 126 | +<!-- |
| 127 | +## Getting started |
| 128 | + |
| 129 | +To experiment with mutable CSI node allocatable count in your Kubernetes v1.33 cluster: |
| 130 | + |
| 131 | +1. Enable the feature gate `MutableCSINodeAllocatableCount` on the `kube-apiserver` and `kubelet` components. |
| 132 | +2. Update your CSI driver configuration by setting `nodeAllocatableUpdatePeriodSeconds`. |
| 133 | +3. Monitor and observe improvements in scheduling accuracy and pod placement reliability. |
| 134 | +--> |
| 135 | +## 快速开始 {#getting-started} |
| 136 | + |
| 137 | +要在 Kubernetes v1.33 集群中试用可变的 CSI 节点可分配数: |
| 138 | + |
| 139 | +1. 在 `kube-apiserver` 和 `kubelet` 组件上启用特性门控 `MutableCSINodeAllocatableCount`。 |
| 140 | +2. 在 CSI 驱动配置中设置 `nodeAllocatableUpdatePeriodSeconds`。 |
| 141 | +3. 监控并观察调度准确性和 Pod 放置可靠性的提升程度。 |
| 142 | + |
| 143 | +<!-- |
| 144 | +## Next steps |
| 145 | + |
| 146 | +This feature is currently in alpha and the Kubernetes community welcomes your feedback. Test it, share your experiences, and help guide its evolution toward beta and GA stability. |
| 147 | + |
| 148 | +Join discussions in the [Kubernetes Storage Special Interest Group (SIG-Storage)](https://github.com/kubernetes/community/tree/master/sig-storage) to shape the future of Kubernetes storage capabilities. |
| 149 | +--> |
| 150 | +## 后续计划 {#next-steps} |
| 151 | + |
| 152 | +此特性目前处于 Alpha 阶段,Kubernetes 社区欢迎你的反馈。 |
| 153 | +无论是参与测试、分享你的经验,都有助于推动此特性向 Beta 和 GA(正式发布)稳定版迈进。 |
| 154 | + |
| 155 | +欢迎加入 [Kubernetes SIG-Storage](https://github.com/kubernetes/community/tree/master/sig-storage) |
| 156 | +的讨论,共同塑造 Kubernetes 存储能力的未来。 |
0 commit comments