Skip to content

Commit 1bdfbea

Browse files
committed
feat: Migrate DRA integration from v1alpha2 to v1 API
Migrate Dynamic Resource Allocation (DRA) integration from the alpha v1alpha2 API to the stable v1 API introduced in Kubernetes 1.34. Major changes: - Update RBAC permissions to access resource.k8s.io API resources (resourceclaims, resourceslices) instead of using kubelet API - Replace kubelet-based DRA resource discovery with direct API queries using new draclient package - Update documentation from ResourceClass to DeviceClass terminology - Change resourceName annotation format to <claim-name>/<request-name> - Update examples from NVIDIA-specific to generic SR-IOV usage - Add comprehensive test coverage for DRA integration - Remove CDI-based device handling in favor of k8s.cni.cncf.io/deviceID attributes Technical details: - Add draclient.GetPodResourceMap() call in k8sclient - Remove getDRAResources() from kubeletclient (now queries API directly) - Update to use ResourceClaimTemplate instead of ResourceClaim - Fix protobuf field naming (CDIDevices -> CdiDevices) - Add 6 new test cases for DRA scenarios in k8sclient_test.go This migration enables Multus to work with the stable DRA API and removes dependency on kubelet's PodResources API for DRA resources. Signed-off-by: Sebastian Sch <sebassch@gmail.com>
1 parent 2a7f1c6 commit 1bdfbea

File tree

10 files changed

+698
-119
lines changed

10 files changed

+698
-119
lines changed

deployments/multus-daemonset-thick.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,15 @@ rules:
7272
- list
7373
- update
7474
- watch
75+
- apiGroups:
76+
- "resource.k8s.io"
77+
resources:
78+
- resourceclaims
79+
- resourceclaims/status
80+
- resourceslices
81+
verbs:
82+
- get
83+
- list
7584
- apiGroups:
7685
- ""
7786
- events.k8s.io

deployments/multus-daemonset.yml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,44 @@ rules:
7979
- create
8080
- patch
8181
- update
82+
kind: ClusterRole
83+
apiVersion: rbac.authorization.k8s.io/v1
84+
metadata:
85+
name: multus
86+
rules:
87+
- apiGroups: ["k8s.cni.cncf.io"]
88+
resources:
89+
- '*'
90+
verbs:
91+
- '*'
92+
- apiGroups:
93+
- ""
94+
resources:
95+
- pods
96+
- pods/status
97+
verbs:
98+
- get
99+
- list
100+
- update
101+
- watch
102+
- apiGroups:
103+
- "resource.k8s.io"
104+
resources:
105+
- resourceclaims
106+
- resourceclaims/status
107+
- resourceslices
108+
verbs:
109+
- get
110+
- list
111+
- apiGroups:
112+
- ""
113+
- events.k8s.io
114+
resources:
115+
- events
116+
verbs:
117+
- create
118+
- patch
119+
- update
82120
---
83121
kind: ClusterRoleBinding
84122
apiVersion: rbac.authorization.k8s.io/v1

docs/how-to-use.md

Lines changed: 87 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -645,112 +645,132 @@ If you wish to have auto configuration use the `readinessindicatorfile` in the c
645645

646646
### Run pod with network annotation and Dynamic Resource Allocation driver
647647

648-
> :warning: Dynamic Resource Allocation (DRA) is [currently an alpha](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/),
649-
> and is subject to change. Please consider this functionality as a preview. The architecture and usage of DRA in
650-
> Multus CNI may change in the future as this technology matures.
651-
>
652-
> The current DRA integration is based on the DRA API for Kubernetes 1.26 to 1.30. With Kubernetes 1.31, the DRA API
653-
> will change and multus doesn't integrate with the new API yet.
654648

655-
Dynamic Resource Allocation is alternative mechanism to device plugin which allows to requests pod and container
656-
resources.
649+
Dynamic Resource Allocation is an alternative mechanism to device plugin which allows pods to request pod and container
650+
resources dynamically.
657651

658-
The following sections describe how to use DRA with multus and NVIDIA DRA driver. Other DRA networking driver vendors
659-
should follow similar concepts to make use of multus DRA support.
652+
The following sections describe how to use DRA with Multus. DRA networking driver vendors should follow similar
653+
concepts to make use of Multus DRA support.
660654

661655
#### Prerequisite
662656

663-
1. Kubernetes 1.27
664-
2. Container Runtime with CDI support enabled
665-
3. Kubernetes runtime-config=resource.k8s.io/v1alpha2
666-
4. Kubernetes feature-gates=DynamicResourceAllocation=True,KubeletPodResourcesDynamicResources=true
657+
1. Kubernetes 1.34+
667658

668659
#### Install DRA driver
669660

670-
The current example uses NVIDIA DRA driver for networking. This DRA driver is not publicly available. An alternative to
671-
this DRA driver is available at [dra-example-driver](https://github.com/kubernetes-sigs/dra-example-driver).
661+
You need to install a DRA driver that provides network devices. For example, you can use the SR-IOV DRA driver or
662+
other DRA networking drivers. Refer to your DRA driver documentation for installation instructions.
672663

673-
#### Create dynamic resource class with NVIDIA network DRA driver
664+
The DRA drive MUST expose the following attribute `k8s.cni.cncf.io/deviceID` containing the device ID
665+
that multus will pass to the CNI
674666

675-
The `ResourceClass` defines the resource pool of `sf-pool-1`.
667+
#### Create network attachment definition with resource name
668+
669+
The `k8s.v1.cni.cncf.io/resourceName` annotation is used to associate a NetworkAttachmentDefinition with DRA resources.
670+
The format is: `<pod-resource-name>/<result-name>` where:
671+
- `pod-resource-name`: The name of the resource claim in the pod's `spec.resourceClaims`
672+
- `result-name`: The name of the device request in the ResourceClaimTemplate's `spec.devices.requests`
673+
674+
Multus queries the ResourceClaim and ResourceSlices APIs to fetch information about allocated DRA devices. When a
675+
NetworkAttachmentDefinition has a `resourceName` annotation that matches a pod's resource claim and result name,
676+
Multus will pass the `k8s.cni.cncf.io/deviceID` to the CNI plugin in the DeviceID field.
677+
678+
##### NetworkAttachmentDefinition for SR-IOV example:
679+
680+
Following command creates a NetworkAttachmentDefinition for SR-IOV. The `resourceName` annotation `sriov/vf` indicates:
681+
- `sriov`: matches the pod's resourceClaim name
682+
- `vf`: matches the device request name in the ResourceClaimTemplate
676683

677684
```
678685
# Execute following command at Kubernetes master
679686
cat <<EOF | kubectl create -f -
680-
apiVersion: resource.k8s.io/v1alpha2
681-
kind: ResourceClass
687+
apiVersion: k8s.cni.cncf.io/v1
688+
kind: NetworkAttachmentDefinition
682689
metadata:
683-
name: sf-pool-1
684-
driverName: net.resource.nvidia.com
690+
name: sriov-net
691+
namespace: default
692+
annotations:
693+
k8s.v1.cni.cncf.io/resourceName: sriov/vf
694+
spec:
695+
config: |-
696+
{
697+
"cniVersion": "1.0.0",
698+
"name": "sriov-net",
699+
"type": "sriov",
700+
"vlan": 0,
701+
"spoofchk": "on",
702+
"trust": "on",
703+
"vlanQoS": 0,
704+
"logLevel": "info",
705+
"ipam": {
706+
"type": "host-local",
707+
"ranges": [
708+
[
709+
{
710+
"subnet": "10.0.2.0/24"
711+
}
712+
]
713+
]
714+
}
715+
}
685716
EOF
686717
```
687718

688-
#### Create network attachment definition with resource name
719+
#### Create Device Class
689720

690-
The `k8s.v1.cni.cncf.io/resourceName` should match the `ResourceClass` name defined in the section above.
691-
In this example it is `sf-pool-1`. Multus query the K8s PodResource API to fetch the `resourceClass` name and also
692-
query the NetworkAttachmentDefinition `k8s.v1.cni.cncf.io/resourceName`. If both has the same name multus send the
693-
CDI device name in the DeviceID argument.
694-
695-
##### NetworkAttachmentDefinition for ovn-kubernetes example:
696-
697-
Following command creates NetworkAttachmentDefinition. CNI config is in `config:` field.
721+
Following command creates a `DeviceClass` for the `ResourceClaimTemplate` to request devices from.
698722

699723
```
700724
# Execute following command at Kubernetes master
701725
cat <<EOF | kubectl create -f -
702-
apiVersion: "k8s.cni.cncf.io/v1"
703-
kind: NetworkAttachmentDefinition
726+
apiVersion: resource.k8s.io/v1
727+
kind: DeviceClass
704728
metadata:
705-
name: default
706-
annotations:
707-
k8s.v1.cni.cncf.io/resourceName: sf-pool-1
729+
name: sriovnetwork.openshift.io
708730
spec:
709-
config: '{
710-
"cniVersion": "0.4.0",
711-
"dns": {},
712-
"ipam": {},
713-
"logFile": "/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log",
714-
"logLevel": "4",
715-
"logfile-maxage": 5,
716-
"logfile-maxbackups": 5,
717-
"logfile-maxsize": 100,
718-
"name": "ovn-kubernetes",
719-
"type": "ovn-k8s-cni-overlay"
720-
}'
731+
selectors:
732+
- cel:
733+
expression: device.driver == sriovnetwork.openshift.io
721734
EOF
722735
```
723736

724-
#### Create DRA Resource Claim
737+
#### Create DRA Resource Claim Template
725738

726-
Following command creates `ResourceClaim` `sf` which request resource from `ResourceClass` `sf-pool-1`.
739+
Following command creates a `ResourceClaimTemplate` that requests a VF device from the SR-IOV device class.
740+
Note the `name: vf` in the requests section, which corresponds to the second part of the resourceName annotation.
727741

728742
```
729743
# Execute following command at Kubernetes master
730744
cat <<EOF | kubectl create -f -
731-
apiVersion: resource.k8s.io/v1alpha2
732-
kind: ResourceClaim
745+
apiVersion: resource.k8s.io/v1
746+
kind: ResourceClaimTemplate
733747
metadata:
734748
namespace: default
735-
name: sf
749+
name: sriov-template
736750
spec:
737751
spec:
738-
resourceClassName: sf-pool-1
752+
devices:
753+
requests:
754+
- name: vf
755+
deviceClassName: sriovnetwork.openshift.io
739756
EOF
740757
```
741758

742759
#### Launch pod with DRA Resource Claim
743760

744-
Following command Launch a Pod with primiry network `default` and `ResourceClaim` `sf`.
761+
Following command launches a Pod with the secondary network `sriov-net` and a DRA resource claim named `sriov`.
762+
The resourceClaim name `sriov` matches the first part of the NetworkAttachmentDefinition's resourceName annotation.
745763

746764
```
765+
# Execute following command at Kubernetes master
766+
cat <<EOF | kubectl create -f -
747767
apiVersion: v1
748768
kind: Pod
749769
metadata:
750770
namespace: default
751-
name: test-sf-claim
771+
name: sriov-pod
752772
annotations:
753-
v1.multus-cni.io/default-network: default
773+
k8s.v1.cni.cncf.io/networks: sriov-net
754774
spec:
755775
restartPolicy: Always
756776
containers:
@@ -759,9 +779,15 @@ spec:
759779
command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
760780
resources:
761781
claims:
762-
- name: resource
782+
- name: sriov
763783
resourceClaims:
764-
- name: resource
765-
source:
766-
resourceClaimName: sf
784+
- name: sriov
785+
resourceClaimTemplateName: sriov-template
786+
EOF
767787
```
788+
789+
In this example:
790+
- The pod has a resourceClaim named `sriov` that uses the `sriov-template`
791+
- The ResourceClaimTemplate has a device request named `vf`
792+
- The NetworkAttachmentDefinition has `resourceName: sriov/vf` which combines both names
793+
- Multus will match these and provide the allocated deviceID to the SR-IOV CNI plugin

pkg/draclient/draclient.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ import (
1414
"gopkg.in/k8snetworkplumbingwg/multus-cni.v4/pkg/types"
1515
)
1616

17-
type ClientInterace interface {
17+
type ClientInterface interface {
1818
GetPodResourceMap(pod *v1.Pod, resourceMap map[string]*types.ResourceInfo) error
1919
}
2020

@@ -24,7 +24,7 @@ type draClient struct {
2424
resourceClaimCache map[string]*resourcev1api.ResourceClaim
2525
}
2626

27-
func NewClient(client resourcev1.ResourceV1Interface) ClientInterace {
27+
func NewClient(client resourcev1.ResourceV1Interface) ClientInterface {
2828
logging.Debugf("NewClient: creating new DRA client")
2929
return &draClient{
3030
client: client,

pkg/draclient/draclient_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ var _ = Describe("DRA Client operations", func() {
4646
Describe("GetPodResourceMap", func() {
4747
var (
4848
fakeClient *fake.Clientset
49-
draClient ClientInterace
49+
draClient ClientInterface
5050
)
5151

5252
BeforeEach(func() {

pkg/k8sclient/k8sclient.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ import (
4242
netclient "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/client/clientset/versioned"
4343
netlister "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/client/listers/k8s.cni.cncf.io/v1"
4444
netutils "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/utils"
45+
"gopkg.in/k8snetworkplumbingwg/multus-cni.v4/pkg/draclient"
4546
"gopkg.in/k8snetworkplumbingwg/multus-cni.v4/pkg/kubeletclient"
4647
"gopkg.in/k8snetworkplumbingwg/multus-cni.v4/pkg/logging"
4748
"gopkg.in/k8snetworkplumbingwg/multus-cni.v4/pkg/types"
@@ -317,6 +318,13 @@ func getKubernetesDelegate(client *ClientInfo, net *types.NetworkSelectionElemen
317318
if err != nil {
318319
return nil, resourceMap, logging.Errorf("getKubernetesDelegate: failed to get resourceMap from ResourceClient: %v", err)
319320
}
321+
322+
dc := draclient.NewClient(client.Client.ResourceV1())
323+
err = dc.GetPodResourceMap(pod, resourceMap)
324+
if err != nil {
325+
return nil, resourceMap, logging.Errorf("getKubernetesDelegate: failed to get resourceMap from DRA client: %v", err)
326+
}
327+
320328
logging.Debugf("getKubernetesDelegate: resourceMap instance: %+v", resourceMap)
321329
}
322330

0 commit comments

Comments
 (0)