You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Fake GPU Operator supports simulating [NVIDIA Compute Domains](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/dra-cds.html) for secure workload isolation without requiring actual NVIDIA hardware. Compute Domains provide IMEX channel simulation for multi-node GPU workloads.
212
+
213
+
### Prerequisites
214
+
215
+
- Kubernetes 1.31+ with DynamicResourceAllocation feature gate enabled
216
+
- DRA plugin enabled in the Fake GPU Operator
217
+
218
+
### Enable Compute Domain in Helm chart
219
+
220
+
```yaml
221
+
# values.yaml
222
+
computeDomainController:
223
+
enabled: true
224
+
computeDomainDraPlugin:
225
+
enabled: true
226
+
draPlugin:
227
+
enabled: true
228
+
devicePlugin:
229
+
enabled: false # Disable legacy plugin when using DRA
230
+
```
231
+
232
+
### Deploy with Compute Domain
233
+
234
+
First, create a ComputeDomain resource:
235
+
236
+
```yaml
237
+
apiVersion: resource.nvidia.com/v1beta1
238
+
kind: ComputeDomain
239
+
metadata:
240
+
name: my-compute-domain
241
+
namespace: default
242
+
spec:
243
+
numNodes: 1
244
+
channel:
245
+
allocationMode: Single # or "All" for all channels
246
+
resourceClaimTemplate:
247
+
name: my-compute-domain
248
+
```
249
+
250
+
The compute-domain-controller will automatically create a ResourceClaimTemplate for the ComputeDomain.
251
+
252
+
Then, deploy a pod that uses the compute domain:
253
+
254
+
```yaml
255
+
apiVersion: v1
256
+
kind: Pod
257
+
metadata:
258
+
name: compute-domain-pod
259
+
namespace: default
260
+
spec:
261
+
containers:
262
+
- name: main
263
+
image: ubuntu:22.04
264
+
command: ["sleep", "infinity"]
265
+
resources:
266
+
claims:
267
+
- name: compute-domain
268
+
resourceClaims:
269
+
- name: compute-domain
270
+
resourceClaimTemplateName: my-compute-domain
271
+
```
272
+
273
+
### Verify Compute Domain Status
274
+
275
+
```bash
276
+
# Check ComputeDomain status
277
+
kubectl get computedomain my-compute-domain -o yaml
278
+
279
+
# Verify status shows Ready and allocated nodes
280
+
# status:
281
+
# status: Ready
282
+
# nodes:
283
+
# - name: <node-name>
284
+
# status: Ready
285
+
```
286
+
209
287
## 🎭 KWOK Integration (Simulated Nodes)
210
288
211
289
[KWOK](https://kwok.sigs.k8s.io/) (Kubernetes WithOut Kubelet) is a toolkit that allows you to simulate thousands of Kubernetes nodes without running actual kubelet processes. When combined with the Fake GPU Operator, you can create large-scale GPU cluster simulations entirely without hardware - perfect for testing schedulers, autoscalers, and resource management at scale.
0 commit comments