Skip to content

Commit cf6deae

Browse files
committed
add RDMA mode configuration example in readme
Signed-off-by: Sebastian Sch <sebassch@gmail.com>
1 parent cf7573b commit cf6deae

File tree

1 file changed

+109
-26
lines changed

1 file changed

+109
-26
lines changed

README.md

Lines changed: 109 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -301,32 +301,6 @@ spec:
301301

302302
> **NOTE**: Currently only `mellanox` plugin can be disabled.
303303

304-
### Parallel draining
305-
306-
It is possible to drain more than one node at a time using this operator.
307-
308-
The configuration is done via the SriovNetworkNodePool, selecting a number of nodes using the node selector and how many
309-
nodes in parallel from the pool the operator can drain in parallel. maxUnavailable can be a number or a percentage.
310-
311-
> **NOTE**: every node can only be part of one pool, if a node is selected by more than one pool, then it will not be drained
312-
313-
> **NOTE**: If a node is not part of any pool it will have a default configuration of maxUnavailable 1
314-
315-
**Example**:
316-
317-
```yaml
318-
apiVersion: sriovnetwork.openshift.io/v1
319-
kind: SriovNetworkPoolConfig
320-
metadata:
321-
name: worker
322-
namespace: sriov-network-operator
323-
spec:
324-
maxUnavailable: 2
325-
nodeSelector:
326-
matchLabels:
327-
node-role.kubernetes.io/worker: ""
328-
```
329-
330304
## Feature Gates
331305

332306
Feature gates are used to enable or disable specific features in the operator.
@@ -371,6 +345,115 @@ spec:
371345
...
372346
```
373347

348+
## SriovNetworkPoolConfig Configuration
349+
350+
The `SriovNetworkPoolConfig` CRD provides advanced configuration capabilities for managing groups of nodes in SR-IOV network environments. This custom resource allows cluster administrators to define node-level configuration policies that apply to specific sets of nodes selected by label selectors.
351+
352+
### Purpose and Benefits
353+
354+
The `SriovNetworkPoolConfig` CRD serves multiple purposes:
355+
356+
1. **Node Pool Management**: Groups nodes into logical pools for coordinated configuration updates
357+
2. **Parallel Operations**: Enables controlled parallel draining and configuration updates across multiple nodes
358+
3. **RDMA Configuration**: Provides centralized RDMA mode configuration for selected nodes
359+
360+
### Key Configuration Fields
361+
362+
#### Node Selection and Availability Control
363+
364+
- **nodeSelector**: Specifies which nodes belong to this pool using Kubernetes label selectors
365+
- **maxUnavailable**: Controls how many nodes can be unavailable simultaneously during updates (supports both integer and percentage values)
366+
367+
#### RDMA Mode Configuration
368+
369+
The `rdmaMode` field allows you to configure the RDMA (Remote Direct Memory Access) subsystem behavior for all nodes in the pool:
370+
371+
- **shared**: Multiple processes can share RDMA resources simultaneously
372+
- **exclusive**: RDMA resources are exclusively assigned to a single process
373+
374+
### Parallel Draining
375+
376+
It is possible to drain more than one node at a time using this operator.
377+
378+
The configuration is done via the SriovNetworkPoolConfig, selecting a number of nodes using the node selector and how many
379+
nodes in parallel from the pool the operator can drain in parallel. maxUnavailable can be a number or a percentage.
380+
381+
> **NOTE**: every node can only be part of one pool, if a node is selected by more than one pool, then it will not be drained
382+
383+
> **NOTE**: If a node is not part of any pool it will have a default configuration of maxUnavailable 1
384+
385+
### Configuration Examples
386+
387+
#### Basic Parallel Draining Configuration
388+
389+
```yaml
390+
apiVersion: sriovnetwork.openshift.io/v1
391+
kind: SriovNetworkPoolConfig
392+
metadata:
393+
name: worker
394+
namespace: sriov-network-operator
395+
spec:
396+
maxUnavailable: 2
397+
nodeSelector:
398+
matchLabels:
399+
node-role.kubernetes.io/worker: ""
400+
```
401+
402+
### RDMA Mode
403+
404+
The RDMA mode setting affects how RDMA resources are managed across the nodes in the pool.
405+
The RDMA mode configuration is applied during node configuration updates and affects all The Mellanox SR-IOV devices on the selected nodes.
406+
407+
*NOTE:* swtiching rdma mode will trigger a reboot to all the nodes in the pool base on the maxUnavailable configuration
408+
409+
#### Exclusive RDMA Mode Configuration
410+
411+
```yaml
412+
apiVersion: sriovnetwork.openshift.io/v1
413+
kind: SriovNetworkPoolConfig
414+
metadata:
415+
name: rdma-workers
416+
namespace: sriov-network-operator
417+
spec:
418+
maxUnavailable: 1
419+
rdmaMode: exclusive
420+
nodeSelector:
421+
matchLabels:
422+
node-role.kubernetes.io/worker: ""
423+
```
424+
425+
#### SriovNetwork with RDMA CNI Plugin
426+
427+
When RDMA mode is set to exclusive, you can create an SriovNetwork that injects the RDMA CNI plugin to allow pods to access hardware counters. Here's an example configuration:
428+
429+
```yaml
430+
apiVersion: sriovnetwork.openshift.io/v1
431+
kind: SriovNetwork
432+
metadata:
433+
name: rdma-network
434+
namespace: sriov-network-operator
435+
spec:
436+
resourceName: rdma_shared_device_a
437+
networkNamespace: default
438+
ipam: |
439+
{
440+
"type": "host-local",
441+
"subnet": "10.10.10.0/24",
442+
"rangeStart": "10.10.10.171",
443+
"rangeEnd": "10.10.10.181"
444+
}
445+
metaPluginsConfig: |
446+
{
447+
"type": "rdma"
448+
}
449+
```
450+
451+
This configuration:
452+
- Uses `metaPluginsConfig` to inject the RDMA CNI plugin
453+
- Allows pods using this network to access hardware counters
454+
- Requires the nodes to be configured with RDMA mode set to exclusive
455+
- Works with SR-IOV network policies that have `isRdma: true` specified
456+
374457
## Components and design
375458

376459
This operator is split into 2 components:

0 commit comments

Comments
 (0)