Skip to content

Commit 4331c89

Browse files
committed
DRA: review feedback
1 parent b9c55d8 commit 4331c89

File tree

1 file changed

+93
-4
lines changed
  • keps/sig-node/3063-dynamic-resource-allocation

1 file changed

+93
-4
lines changed

keps/sig-node/3063-dynamic-resource-allocation/README.md

Lines changed: 93 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,8 @@ SIG Architecture for cross-cutting KEPs).
105105
- [core](#core)
106106
- [kube-controller-manager](#kube-controller-manager)
107107
- [kube-scheduler](#kube-scheduler)
108+
- [EventsToRegister](#eventstoregister)
109+
- [PreEnqueue](#preenqueue)
108110
- [Pre-filter](#pre-filter)
109111
- [Filter](#filter)
110112
- [Post-filter](#post-filter)
@@ -149,6 +151,11 @@ SIG Architecture for cross-cutting KEPs).
149151
- [Extend Device Plugins](#extend-device-plugins)
150152
- [Webhooks instead of ResourceClaim updates](#webhooks-instead-of-resourceclaim-updates)
151153
- [ResourceDriver](#resourcedriver)
154+
- [Complex sharing of ResourceClaim](#complex-sharing-of-resourceclaim)
155+
- [Improving scheduling performance](#improving-scheduling-performance)
156+
- [Optimize for network-attached resources](#optimize-for-network-attached-resources)
157+
- [Moving blocking API calls into goroutines](#moving-blocking-api-calls-into-goroutines)
158+
- [RPC calls instead of <code>PodSchedulingContext</code>](#rpc-calls-instead-of-)
152159
- [Infrastructure Needed](#infrastructure-needed)
153160
<!-- /toc -->
154161

@@ -735,7 +742,7 @@ For a resource driver the following components are needed:
735742
- *Resource kubelet plugin*: a component which cooperates with kubelet to prepare
736743
the usage of the resource on a node.
737744

738-
An [utility library](https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/dynamic-resource-allocation) for resource drivers was developed.
745+
A [utility library](https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/dynamic-resource-allocation) for resource drivers was developed.
739746
It does not have to be used by drivers, therefore it is not described further
740747
in this KEP.
741748

@@ -1825,7 +1832,35 @@ notices this, the current scheduling attempt for the pod must stop and the pod
18251832
needs to be put back into the work queue. It then gets retried whenever a
18261833
ResourceClaim gets added or modified.
18271834

1828-
The following extension points are implemented in the new claim plugin:
1835+
The following extension points are implemented in the new claim plugin. Some of
1836+
them invoke API calls to create or update objects. This is done to simplify
1837+
error handling: a failure during such a call puts the pod into the backoff
1838+
queue where it will be retried after a timeout. The downside is that the
1839+
latency caused by those blocking calls not only affects pods using claims, but
1840+
also all other pending pods because the scheduler only schedules one pod at a
1841+
time.
1842+
1843+
#### EventsToRegister
1844+
1845+
This registers all cluster events that might make an unschedulable pod
1846+
schedulable, like creating a claim that the pod needs or finishing the
1847+
allocation of a claim.
1848+
1849+
[Queuing hints](https://github.com/kubernetes/enhancements/issues/4247) are
1850+
supported. These are callbacks that can limit the effect of a cluster event to
1851+
specific pods. For example, allocating a claim only makes those pods
1852+
scheduleable which reference the claim. There is no need to try scheduling a pod
1853+
which waits for some other claim. Hints are also used to trigger the next
1854+
scheduling cycle for a pod immediately when some expected and require event
1855+
like "drivers have provided information" occurs, instead of forcing the pod to
1856+
go through the backoff queue and the usually 5 second long delay associated
1857+
with that.
1858+
1859+
#### PreEnqueue
1860+
1861+
This checks whether all claims referenced by a pod exist. If they don't,
1862+
scheduling the pod has to wait until the kube-controller-manager or user create
1863+
the claims.
18291864

18301865
#### Pre-filter
18311866

@@ -2770,8 +2805,16 @@ controller [were added](https://github.com/kubernetes/kubernetes/blob/163553bbe0
27702805
- Metric name: `resource_controller_create_failures_total`
27712806
- Metric name: `workqueue` with `name="resource_claim"`
27722807

2773-
For kube-scheduler and kubelet, the existing metrics for handling Pods will be
2774-
used.
2808+
For kube-scheduler and kubelet, existing metrics for handling Pods already
2809+
cover most aspects. For example, in the scheduler the
2810+
["unschedulable_pods"](https://github.com/kubernetes/kubernetes/blob/6f5fa2eb2f4dc731243b00f7e781e95589b5621f/pkg/scheduler/metrics/metrics.go#L200-L206)
2811+
metric will call out pods that are currently unschedulable because of the
2812+
`DynamicResources` plugin.
2813+
2814+
For the communication between scheduler and controller, the apiserver metrics
2815+
about API calls (e.g. `request_total`, `request_duration_seconds`) for the
2816+
`podschedulingcontexts` and `resourceclaims` resources provide insights into
2817+
the amount of requests and how long they are taking.
27752818

27762819
###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
27772820

@@ -3113,6 +3156,52 @@ type ResourceDriverFeature struct {
31133156
}
31143157
```
31153158

3159+
### Complex sharing of ResourceClaim
3160+
3161+
At the moment, the allocation result marks as a claim as either "shareable" by
3162+
an unlimited number of consumers or "not shareable". More complex scenarios
3163+
might be useful like "may be shared by a certain number of consumers", but so
3164+
far such use cases have not come up yet. If they do, the `AllocationResult` can
3165+
be extended with new fields as defined by a follow-up KEP.
3166+
3167+
### Improving scheduling performance
3168+
3169+
Some enhancements are possible which haven't been implemented yet because it is
3170+
unclear how important they would be in practice. All of the following ideas
3171+
could still be added later as they don't conflict with the underlying design,
3172+
either as part of this KEP or in follow-up KEPs.
3173+
3174+
#### Optimize for network-attached resources
3175+
3176+
When a network-attached resource is available on all nodes in a cluster, the
3177+
driver will never mark any nodes as unsuitable. If all claims for a pod fall
3178+
into that category, the scheduler a) does not need to wait for information and
3179+
b) does not need to publish "potential nodes".
3180+
3181+
The `ResourceClass` could be extended with a `AvailableForNodes
3182+
*core.NodeSelector`. This can be a selector that matches all nodes or a
3183+
subset. Either way, if a potential node matches this selector, the scheduler
3184+
knows that claims using this class can be allocated and can do the optimization
3185+
outlined above.
3186+
3187+
#### Moving blocking API calls into goroutines
3188+
3189+
This [is being
3190+
discussed](https://github.com/kubernetes/kubernetes/issues/120502) and has been
3191+
[partially
3192+
implemented](https://github.com/kubernetes/kubernetes/pull/120963). That
3193+
implementation made the scheduler framework more complex, so [the
3194+
conclusion](https://kubernetes.slack.com/archives/C09TP78DV/p1696307377064469?thread_ts=1696246271.825109&cid=C09TP78DV)
3195+
was that using blocking calls is the lesser evil until user feedback indicates
3196+
that improvements are really needed.
3197+
3198+
#### RPC calls instead of `PodSchedulingContext`
3199+
3200+
The current design is not making it a hard requirement that admins change the
3201+
scheduler configuration to enable communication between scheduler and DRA
3202+
drivers. For scenarios where admins and vendors are willing to invest more
3203+
effort and doing so would provide performance benefits, a communication path
3204+
similar to scheduler extenders could be added.
31163205

31173206
## Infrastructure Needed
31183207

0 commit comments

Comments
 (0)