You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of implementing the solution within the scheduler, we propose using the Cluster Autoscaler to manage the attachment and detachment of fabric devices.
562
-
561
+
Instead of implementing the solution within the scheduler, we can use "device autoscaler" which is a device version of ClusterAutoscaler(CA).
563
562
The key points and main process flow of this alternative proposal are as follows:
564
563
565
564
The scheduler references only node-local ResourceSlices.
566
565
If there are no available resources in the node-local ResourceSlices, the scheduler marks the Pod as unschedulable without waiting in the PreBind phase of the ResourceClaim.
566
+
And then, device autoscaler tries to attach new devices.
567
+
And it also try to detach devices if they have not been used for a period of time.
568
+
This is similar to the concept of CA.
569
+
570
+
However, if CA and device autoscaler is running independently, CA may add a node with a device at the same time as the device autoscaler attaches the device.
571
+
This is a wasted resource addition.
572
+
Therefore, there is the following idea that putting device-scale functionality in CA.
567
573
568
-
To handle fabric resources, we implement the Processor for composable system within CA.
574
+
To handle fabric resources in CA, we implement the Processor for composable system within CA.
569
575
This Processor identifies unschedulable Pods and determines if attaching a fabric ResourceSlice device to an existing node would make scheduling possible.
570
576
If so, the Processor instructs the attachment of the resource, using the composable Operator for the actual attachment process.
571
577
If attaching the fabric ResourceSlice does not make scheduling possible, the Processor determines whether to add a new node as usual.
572
578
573
579
After the device is attached, the vendor DRA updates the node-local ResourceSlices.
574
-
The vendor DRA needs a rescan function to update the Pool/ResourceSlice. The scheduler can then assign the node-local ResourceSlice devices to the unschedulable Pod, operating the same as the usual DRA from this point.
575
-
580
+
The vendor DRA needs a rescan function to update the Pool/ResourceSlice.
581
+
The scheduler can then assign the node-local ResourceSlice devices to the unschedulable Pod, operating the same as the usual DRA from this point.
576
582
577
583
### Test Plan
578
584
@@ -665,6 +671,8 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
665
671
- Gather feedback from developers and surveys
666
672
- Resolove the following issues
667
673
- Scheduler does not guarantee to pick up the same node for the Pod after the restart
674
+
- If Scheduler picks up another node for the Pod after the restart, devices are unnecessarily left on the original nodes
675
+
(Composable DRA controller needs to have the function to detach a device automatically if it is not used by a Pod for a certain period of time)
668
676
- Pods which are not bound yet (in api-server) and not unschedulable (in api-server) are not visible by cluster autoscaler, so there is a risk that the node will be turned down
669
677
- The in-flight events cache may grow too large when waiting in PreBind
670
678
- Additional tests are in Testgrid and linked in KEP
0 commit comments