Skip to content

Commit 600ae91

Browse files
authored
Merge pull request #42558 from windsonsea/devplu
[zh] sync device-plugins.md and dynamic-resource-allocation.md
2 parents 29a6364 + 34c0afe commit 600ae91

File tree

2 files changed

+89
-39
lines changed

2 files changed

+89
-39
lines changed

content/zh-cn/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md

Lines changed: 34 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,38 @@ The general workflow of a device plugin includes the following steps:
256256
如果操作成功,则设备插件将返回 `AllocateResponse`,其中包含用于访问被分配的设备容器运行时的配置。
257257
kubelet 将此信息传递到容器运行时。
258258

259+
<!--
260+
An `AllocateResponse` contains zero or more `ContainerAllocateResponse` objects. In these, the
261+
device plugin defines modifications that must be made to a container's definition to provide
262+
access to the device. These modifications include:
263+
-->
264+
`AllocateResponse` 包含零个或多个 `ContainerAllocateResponse` 对象。
265+
设备插件在这些对象中给出为了访问设备而必须对容器定义所进行的修改。
266+
这些修改包括:
267+
268+
<!--
269+
* annotations
270+
* device nodes
271+
* environment variables
272+
* mounts
273+
* fully-qualified CDI device names
274+
-->
275+
* 注解
276+
* 设备节点
277+
* 环境变量
278+
* 挂载点
279+
* 完全限定的 CDI 设备名称
280+
281+
{{< note >}}
282+
<!--
283+
The processing of the fully-qualified CDI device names by the Device Manager requires
284+
the `DevicePluginCDIDevices` feature gate to be enabled. This was added as an alpha feature in
285+
v1.28.
286+
-->
287+
设备管理器处理完全限定的 CDI 设备名称时需要启用 `DevicePluginCDIDevices` 特性门控。
288+
这是在 v1.28 版本中作为 Alpha 特性添加的。
289+
{{< /note >}}
290+
259291
<!--
260292
### Handling kubelet restarts
261293
@@ -352,7 +384,7 @@ of the device allocations during the upgrade.
352384
-->
353385
## 监控设备插件资源 {#monitoring-device-plugin-resources}
354386

355-
{{< feature-state for_k8s_version="v1.15" state="beta" >}}
387+
{{< feature-state for_k8s_version="v1.28" state="stable" >}}
356388

357389
<!--
358390
In order to monitor resources provided by device plugins, monitoring agents need to be able to
@@ -584,7 +616,7 @@ below:
584616
-->
585617
### `GetAllocatableResources` gRPC 端点 {#grpc-endpoint-getallocatableresources}
586618

587-
{{< feature-state state="beta" for_k8s_version="v1.23" >}}
619+
{{< feature-state state="stable" for_k8s_version="v1.28" >}}
588620

589621
<!--
590622
GetAllocatableResources provides information on resources initially available on the worker node.
@@ -623,23 +655,6 @@ message AllocatableResourcesResponse {
623655
}
624656
```
625657

626-
<!--
627-
Starting from Kubernetes v1.23, the `GetAllocatableResources` is enabled by default.
628-
You can disable it by turning off the `KubeletPodResourcesGetAllocatable`
629-
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
630-
631-
Preceding Kubernetes v1.23, to enable this feature `kubelet` must be started with the following flag:
632-
-->
633-
从 Kubernetes v1.23 开始,`GetAllocatableResources` 被默认启用。
634-
你可以通过关闭 `KubeletPodResourcesGetAllocatable`
635-
[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)来禁用。
636-
637-
在 Kubernetes v1.23 之前,要启用这一功能,`kubelet` 必须用以下标志启动:
638-
639-
```
640-
--feature-gates=KubeletPodResourcesGetAllocatable=true
641-
```
642-
643658
<!--
644659
`ContainerDevices` do expose the topology information declaring to which NUMA cells the device is
645660
affine. The NUMA cells are identified using a opaque integer ID, which value is consistent to

content/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 55 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,14 @@ weight: 65
1717
{{< feature-state for_k8s_version="v1.27" state="alpha" >}}
1818

1919
<!--
20-
Dynamic resource allocation is a new API for requesting and sharing resources
20+
Dynamic resource allocation is an API for requesting and sharing resources
2121
between pods and containers inside a pod. It is a generalization of the
2222
persistent volumes API for generic resources. Third-party resource drivers are
2323
responsible for tracking and allocating resources. Different kinds of
2424
resources support arbitrary parameters for defining requirements and
2525
initialization.
2626
-->
27-
动态资源分配是一个用于在 Pod 之间和 Pod 内部容器之间请求和共享资源的新 API。
27+
动态资源分配是一个用于在 Pod 之间和 Pod 内部容器之间请求和共享资源的 API。
2828
它是对为通用资源所提供的持久卷 API 的泛化。第三方资源驱动程序负责跟踪和分配资源。
2929
不同类型的资源支持用任意参数进行定义和初始化。
3030

@@ -49,10 +49,10 @@ Kubernetes v{{< skew currentVersion >}} 包含用于动态资源分配的集群
4949
## API {#api}
5050
<!--
5151
The `resource.k8s.io/v1alpha2` {{< glossary_tooltip text="API group"
52-
term_id="api-group" >}} provides four new types:
52+
term_id="api-group" >}} provides four types:
5353
-->
5454
`resource.k8s.io/v1alpha2`
55-
{{< glossary_tooltip text="API 组" term_id="api-group" >}}提供四种新类型
55+
{{< glossary_tooltip text="API 组" term_id="api-group" >}}提供四种类型
5656

5757
<!--
5858
ResourceClass
@@ -106,14 +106,14 @@ ResourceClass 和 ResourceClaim 的参数存储在单独的对象中,
106106
term_id="CustomResourceDefinition" text="CRD" >}} 所定义的类型。
107107

108108
<!--
109-
The `core/v1` `PodSpec` defines ResourceClaims that are needed for a Pod in a new
109+
The `core/v1` `PodSpec` defines ResourceClaims that are needed for a Pod in a
110110
`resourceClaims` field. Entries in that list reference either a ResourceClaim
111111
or a ResourceClaimTemplate. When referencing a ResourceClaim, all Pods using
112112
this PodSpec (for example, inside a Deployment or StatefulSet) share the same
113113
ResourceClaim instance. When referencing a ResourceClaimTemplate, each Pod gets
114114
its own instance.
115115
-->
116-
`core/v1``PodSpec` 在新的 `resourceClaims` 字段中定义 Pod 所需的 ResourceClaim。
116+
`core/v1``PodSpec` `resourceClaims` 字段中定义 Pod 所需的 ResourceClaim。
117117
该列表中的条目引用 ResourceClaim 或 ResourceClaimTemplate。
118118
当引用 ResourceClaim 时,使用此 PodSpec 的所有 Pod
119119
(例如 Deployment 或 StatefulSet 中的 Pod)共享相同的 ResourceClaim 实例。
@@ -265,23 +265,58 @@ running Pods. For more information on the gRPC endpoints, see the
265265
kubelet 提供了一个 gRPC 服务,以便发现正在运行的 Pod 的动态资源。
266266
有关 gRPC 端点的更多信息,请参阅[资源分配报告](/zh-cn/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources)。
267267
268-
<!--
269-
## Limitations
268+
<!--
269+
## Pre-scheduled Pods
270+
271+
When you - or another API client - create a Pod with `spec.nodeName` already set, the scheduler gets bypassed.
272+
If some ResourceClaim needed by that Pod does not exist yet, is not allocated
273+
or not reserved for the Pod, then the kubelet will fail to run the Pod and
274+
re-check periodically because those requirements might still get fulfilled
275+
later.
270276
-->
271-
## 限制 {#limitations}
277+
## 预调度的 Pod
272278

273-
<!--
274-
The scheduler plugin must be involved in scheduling Pods which use
275-
ResourceClaims. Bypassing the scheduler by setting the `nodeName` field leads
276-
to Pods that the kubelet refuses to start because the ResourceClaims are not
277-
reserved or not even allocated. It may be possible to [remove this
278-
limitation](https://github.com/kubernetes/kubernetes/issues/114005) in the
279-
future.
279+
当你(或别的 API 客户端)创建设置了 `spec.nodeName` 的 Pod 时,调度器将被绕过。
280+
如果 Pod 所需的某个 ResourceClaim 尚不存在、未被分配或未为该 Pod 保留,那么 kubelet
281+
将无法运行该 Pod,并会定期重新检查,因为这些要求可能在以后得到满足。
282+
283+
<!--
284+
Such a situation can also arise when support for dynamic resource allocation
285+
was not enabled in the scheduler at the time when the Pod got scheduled
286+
(version skew, configuration, feature gate, etc.). kube-controller-manager
287+
detects this and tries to make the Pod runnable by triggering allocation and/or
288+
reserving the required ResourceClaims.
289+
-->
290+
这种情况也可能发生在 Pod 被调度时调度器中未启用动态资源分配支持的时候(原因可能是版本偏差、配置、特性门控等)。
291+
kube-controller-manager 能够检测到这一点,并尝试通过触发分配和/或预留所需的 ResourceClaim 来使 Pod 可运行。
292+
293+
<!--
294+
However, it is better to avoid this because a Pod that is assigned to a node
295+
blocks normal resources (RAM, CPU) that then cannot be used for other Pods
296+
while the Pod is stuck. To make a Pod run on a specific node while still going
297+
through the normal scheduling flow, create the Pod with a node selector that
298+
exactly matches the desired node:
299+
-->
300+
然而,最好避免这种情况,因为分配给节点的 Pod 会锁住一些正常的资源(RAM、CPU),
301+
而这些资源在 Pod 被卡住时无法用于其他 Pod。为了让一个 Pod 在特定节点上运行,
302+
同时仍然通过正常的调度流程进行,请在创建 Pod 时使用与期望的节点精确匹配的节点选择算符:
303+
304+
```yaml
305+
apiVersion: v1
306+
kind: Pod
307+
metadata:
308+
name: pod-with-cats
309+
spec:
310+
nodeSelector:
311+
kubernetes.io/hostname: name-of-the-intended-node
312+
...
313+
```
314+
315+
<!--
316+
You may also be able to mutate the incoming Pod, at admission time, to unset
317+
the `.spec.nodeName` field and to use a node selector instead.
280318
-->
281-
调度器插件必须参与调度那些使用 ResourceClaim 的 Pod。
282-
通过设置 `nodeName` 字段绕过调度器会导致 kubelet 拒绝启动 Pod,
283-
因为 ResourceClaim 没有被保留或甚至根本没有被分配。
284-
未来可能[去除该限制](https://github.com/kubernetes/kubernetes/issues/114005)。
319+
你还可以在准入时变更传入的 Pod,取消设置 `.spec.nodeName` 字段,并改为使用节点选择算符。
285320

286321
<!--
287322
## Enabling dynamic resource allocation

0 commit comments

Comments
 (0)