@@ -140,111 +140,121 @@ spec:
140
140
141
141
The general workflow of a device plugin includes the following steps:
142
142
143
- * Initialization. During this phase, the device plugin performs vendor specific
143
+ 1. Initialization. During this phase, the device plugin performs vendor- specific
144
144
initialization and setup to make sure the devices are in a ready state.
145
145
146
- * The plugin starts a gRPC service, with a Unix socket under host path
146
+ 1. The plugin starts a gRPC service, with a Unix socket under the host path
147
147
`/var/lib/kubelet/device-plugins/`, that implements the following interfaces:
148
148
-->
149
149
## 设备插件的实现 {#device-plugin-implementation}
150
150
151
151
设备插件的常规工作流程包括以下几个步骤:
152
152
153
- * 初始化。在这个阶段,设备插件将执行供应商特定的初始化和设置,
154
- 以确保设备处于就绪状态。
155
- * 插件使用主机路径 ` /var/lib/kubelet/device-plugins/ ` 下的 Unix 套接字启动一个
156
- gRPC 服务,该服务实现以下接口:
157
-
158
- <!--
159
- ```gRPC
160
- service DevicePlugin {
161
- // GetDevicePluginOptions returns options to be communicated with Device Manager.
162
- rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
163
-
164
- // ListAndWatch returns a stream of List of Devices
165
- // Whenever a Device state change or a Device disappears, ListAndWatch
166
- // returns the new list
167
- rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
168
-
169
- // Allocate is called during container creation so that the Device
170
- // Plugin can run device specific operations and instruct Kubelet
171
- // of the steps to make the Device available in the container
172
- rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
173
-
174
- // GetPreferredAllocation returns a preferred set of devices to allocate
175
- // from a list of available ones. The resulting preferred allocation is not
176
- // guaranteed to be the allocation ultimately performed by the
177
- // devicemanager. It is only designed to help the devicemanager make a more
178
- // informed allocation decision when possible.
179
- rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
180
-
181
- // PreStartContainer is called, if indicated by Device Plugin during registeration phase,
182
- // before each container start. Device plugin can run device specific operations
183
- // such as resetting the device before making devices available to the container.
184
- rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
185
- }
186
- ```
187
- -->
188
- ``` gRPC
189
- service DevicePlugin {
190
- // GetDevicePluginOptions 返回与设备管理器沟通的选项。
191
- rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
192
-
193
- // ListAndWatch 返回 Device 列表构成的数据流。
194
- // 当 Device 状态发生变化或者 Device 消失时,ListAndWatch
195
- // 会返回新的列表。
196
- rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
197
-
198
- // Allocate 在容器创建期间调用,这样设备插件可以运行一些特定于设备的操作,
199
- // 并告诉 kubelet 如何令 Device 可在容器中访问的所需执行的具体步骤
200
- rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
201
-
202
- // GetPreferredAllocation 从一组可用的设备中返回一些优选的设备用来分配,
203
- // 所返回的优选分配结果不一定会是设备管理器的最终分配方案。
204
- // 此接口的设计仅是为了让设备管理器能够在可能的情况下做出更有意义的决定。
205
- rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
206
-
207
- // PreStartContainer 在设备插件注册阶段根据需要被调用,调用发生在容器启动之前。
208
- // 在将设备提供给容器使用之前,设备插件可以运行一些诸如重置设备之类的特定于
209
- // 具体设备的操作,
210
- rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
211
- }
212
- ```
213
-
214
- {{< note >}}
215
- <!--
216
- Plugins are not required to provide useful implementations for
217
- `GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating which
218
- (if any) of these calls are available should be set in the `DevicePluginOptions`
219
- message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will
220
- always call `GetDevicePluginOptions()` to see which optional functions are
221
- available, before calling any of them directly.
222
- -->
223
- 插件并非必须为 ` GetPreferredAllocation() ` 或 ` PreStartContainer() ` 提供有用的实现逻辑,
224
- 调用 ` GetDevicePluginOptions() ` 时所返回的 ` DevicePluginOptions `
225
- 消息中应该设置这些调用是否可用。` kubelet ` 在真正调用这些函数之前,总会调用
226
- ` GetDevicePluginOptions() ` 来查看是否存在这些可选的函数。
227
- {{< /note >}}
228
-
229
- <!--
230
- * The plugin registers itself with the kubelet through the Unix socket at host
231
- path `/var/lib/kubelet/device-plugins/kubelet.sock`.
232
-
233
- * After successfully registering itself, the device plugin runs in serving mode, during which it keeps
234
- monitoring device health and reports back to the kubelet upon any device state changes.
235
- It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may
236
- do device-specific preparation; for example, GPU cleanup or QRNG initialization.
237
- If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
238
- runtime configurations for accessing the allocated devices. The kubelet passes this information
239
- to the container runtime.
240
- -->
241
- * 插件通过 Unix socket 在主机路径 ` /var/lib/kubelet/device-plugins/kubelet.sock `
242
- 处向 kubelet 注册自身。
243
- * 成功注册自身后,设备插件将以服务模式运行,在此期间,它将持续监控设备运行状况,
244
- 并在设备状态发生任何变化时向 kubelet 报告。它还负责响应 ` Allocate ` gRPC 请求。
245
- 在 ` Allocate ` 期间,设备插件可能还会做一些设备特定的准备;例如 GPU 清理或 QRNG 初始化。
246
- 如果操作成功,则设备插件将返回 ` AllocateResponse ` ,其中包含用于访问被分配的设备容器运行时的配置。
247
- kubelet 将此信息传递到容器运行时。
153
+ 1 . 初始化。在这个阶段,设备插件将执行特定于供应商的初始化和设置,以确保设备处于就绪状态。
154
+
155
+ 2 . 插件使用主机路径 ` /var/lib/kubelet/device-plugins/ ` 下的 UNIX 套接字启动一个
156
+ gRPC 服务,该服务实现以下接口:
157
+
158
+ <!--
159
+ ```gRPC
160
+ service DevicePlugin {
161
+ // GetDevicePluginOptions returns options to be communicated with Device Manager.
162
+ rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
163
+
164
+ // ListAndWatch returns a stream of List of Devices
165
+ // Whenever a Device state change or a Device disappears, ListAndWatch
166
+ // returns the new list
167
+ rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
168
+
169
+ // Allocate is called during container creation so that the Device
170
+ // Plugin can run device specific operations and instruct Kubelet
171
+ // of the steps to make the Device available in the container
172
+ rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
173
+
174
+ // GetPreferredAllocation returns a preferred set of devices to allocate
175
+ // from a list of available ones. The resulting preferred allocation is not
176
+ // guaranteed to be the allocation ultimately performed by the
177
+ // devicemanager. It is only designed to help the devicemanager make a more
178
+ // informed allocation decision when possible.
179
+ rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
180
+
181
+ // PreStartContainer is called, if indicated by Device Plugin during registeration phase,
182
+ // before each container start. Device plugin can run device specific operations
183
+ // such as resetting the device before making devices available to the container.
184
+ rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
185
+ }
186
+ ```
187
+ -->
188
+ ``` gRPC
189
+ service DevicePlugin {
190
+ // GetDevicePluginOptions 返回与设备管理器沟通的选项。
191
+ rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
192
+
193
+ // ListAndWatch 返回 Device 列表构成的数据流。
194
+ // 当 Device 状态发生变化或者 Device 消失时,ListAndWatch
195
+ // 会返回新的列表。
196
+ rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
197
+
198
+ // Allocate 在容器创建期间调用,这样设备插件可以运行一些特定于设备的操作,
199
+ // 并告诉 kubelet 如何令 Device 可在容器中访问的所需执行的具体步骤
200
+ rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
201
+
202
+ // GetPreferredAllocation 从一组可用的设备中返回一些优选的设备用来分配,
203
+ // 所返回的优选分配结果不一定会是设备管理器的最终分配方案。
204
+ // 此接口的设计仅是为了让设备管理器能够在可能的情况下做出更有意义的决定。
205
+ rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
206
+
207
+ // PreStartContainer 在设备插件注册阶段根据需要被调用,调用发生在容器启动之前。
208
+ // 在将设备提供给容器使用之前,设备插件可以运行一些诸如重置设备之类的特定于
209
+ // 具体设备的操作,
210
+ rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
211
+ }
212
+ ```
213
+
214
+ {{< note >}}
215
+ <!--
216
+ Plugins are not required to provide useful implementations for
217
+ `GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating
218
+ the availability of these calls, if any, should be set in the `DevicePluginOptions`
219
+ message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will
220
+ always call `GetDevicePluginOptions()` to see which optional functions are
221
+ available, before calling any of them directly.
222
+ -->
223
+ 插件并非必须为 ` GetPreferredAllocation() ` 或 ` PreStartContainer() ` 提供有用的实现逻辑,
224
+ 调用 ` GetDevicePluginOptions() ` 时所返回的 ` DevicePluginOptions `
225
+ 消息中应该设置一些标志,表明这些调用(如果有)是否可用。` kubelet ` 在直接调用这些函数之前,总会调用
226
+ ` GetDevicePluginOptions() ` 来查看哪些可选的函数可用。
227
+ {{< /note >}}
228
+
229
+ <!--
230
+ 1. The plugin registers itself with the kubelet through the Unix socket at host
231
+ path `/var/lib/kubelet/device-plugins/kubelet.sock`.
232
+ -->
233
+ 3 . 插件通过位于主机路径 ` /var/lib/kubelet/device-plugins/kubelet.sock ` 下的 UNIX 套接字
234
+ 向 kubelet 注册自身。
235
+
236
+ {{< note >}}
237
+ <!--
238
+ The ordering of the workflow is important. A plugin MUST start serving gRPC
239
+ service before registering itself with kubelet for successful registration.
240
+ -->
241
+ 工作流程的顺序很重要。插件必须在向 kubelet 注册自己之前开始提供 gRPC 服务,才能保证注册成功。
242
+ {{< /note >}}
243
+
244
+ <!--
245
+ 1. After successfully registering itself, the device plugin runs in serving mode, during which it keeps
246
+ monitoring device health and reports back to the kubelet upon any device state changes.
247
+ It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may
248
+ do device-specific preparation; for example, GPU cleanup or QRNG initialization.
249
+ If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
250
+ runtime configurations for accessing the allocated devices. The kubelet passes this information
251
+ to the container runtime.
252
+ -->
253
+ 4 . 成功注册自身后,设备插件将以提供服务的模式运行,在此期间,它将持续监控设备运行状况,
254
+ 并在设备状态发生任何变化时向 kubelet 报告。它还负责响应 ` Allocate ` gRPC 请求。
255
+ 在 ` Allocate ` 期间,设备插件可能还会做一些特定于设备的准备;例如 GPU 清理或 QRNG 初始化。
256
+ 如果操作成功,则设备插件将返回 ` AllocateResponse ` ,其中包含用于访问被分配的设备容器运行时的配置。
257
+ kubelet 将此信息传递到容器运行时。
248
258
249
259
<!--
250
260
### Handling kubelet restarts
0 commit comments