Unify internal and external informer factories#1
Open
sttts wants to merge 7 commits intop0lyn0mial:admission_options_spits_out_admission_controlfrom
Open
Unify internal and external informer factories#1sttts wants to merge 7 commits intop0lyn0mial:admission_options_spits_out_admission_controlfrom
sttts wants to merge 7 commits intop0lyn0mial:admission_options_spits_out_admission_controlfrom
Conversation
…the server configuration.
|
@p0lyn0mial I spoke with @sttts in IRC. We've agreed that we can proceed with kubernetes#45355 without this pull in your branch. One of the concerns is the silent ordering requirement that we're introducing. Some of it may be ameliorated by leveraging |
Owner
|
thanks for letting me know. |
4818f16 to
8cea69a
Compare
p0lyn0mial
pushed a commit
that referenced
this pull request
May 25, 2017
…mance Automatic merge from submit-queue (batch tested with PRs 38505, 41785, 46315) Only retrieve relevant volumes **What this PR does / why we need it**: Improves performance for Cinder volume attach/detach calls. Currently when Cinder volumes are attached or detached, functions try to retrieve details about the volume from the Nova API. Because some only have the volume name not its UUID, they use the list function in gophercloud to iterate over all volumes to find a match. This incurs severe performance problems on OpenStack projects with lots of volumes (sometimes thousands) since it needs to send a new request when the current page does not contain a match. A better way of doing this is use the `?name=XXX` query parameter to refine the results. **Which issue this PR fixes**: kubernetes#26404 **Special notes for your reviewer**: There were 2 ways of addressing this problem: 1. Use the `name` query parameter 2. Instead of using the list function, switch to using volume UUIDs and use the GET function instead. You'd need to change the signature of a few functions though, such as [`DeleteVolume`](https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/cinder/cinder.go#L49), so I'm not sure how backwards compatible that is. Since #1 does effectively the same as #2, I went with it because it ensures BC. One assumption that is made is that the `volumeName` being retrieved matches exactly the name of the volume in Cinder. I'm not sure how accurate that is, but I see no reason why cloud providers would want to append/prefix things arbitrarily. **Release note**: ```release-note Improves performance of Cinder volume attach/detach operations ```
p0lyn0mial
pushed a commit
that referenced
this pull request
Jul 3, 2017
Automatic merge from submit-queue (batch tested with PRs 47523, 47438, 47550, 47450, 47612) Move slow PV test to slow suite. See [testgrid](https://k8s-testgrid.appspot.com/google-gce#gce&width=5&graph-metrics=test-duration-minutes). #1
p0lyn0mial
pushed a commit
that referenced
this pull request
Oct 1, 2018
p0lyn0mial
pushed a commit
that referenced
this pull request
Jan 22, 2019
update from kubernetes master
p0lyn0mial
pushed a commit
that referenced
this pull request
Dec 11, 2020
p0lyn0mial
pushed a commit
that referenced
this pull request
Mar 4, 2021
Sharing the same connection for multiple streams should have worked,
but ran into unexpected timeouts:
I0227 08:07:49.754263 80029 portproxy.go:109] container "mock" in pod csi-mock-volumes-4037-2061/csi-mockplugin-0 is running
E0227 08:07:49.779359 80029 portproxy.go:178] prepare forwarding csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: dialer failed: unable to upgrade connection: pod not found ("csi-mockplugin-0_csi-mock-volumes-4037-2061")
I0227 08:07:50.782705 80029 portproxy.go:109] container "mock" in pod csi-mock-volumes-4037-2061/csi-mockplugin-0 is running
I0227 08:07:50.809326 80029 portproxy.go:125] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: starting connection polling
I0227 08:07:50.909544 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #0, 0 open
I0227 08:07:50.912436 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #0
I0227 08:07:50.912503 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #0
I0227 08:07:50.913161 80029 portproxy.go:322] forward connection #0 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream
E0227 08:07:50.913324 80029 portproxy.go:242] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: an error occurred connecting to the remote port: error forwarding port 9000 to pod 66662ea1ab30b4193dac0102c49be840971d337c802cc0c8bbc074214522bd13, uid : failed to execute portforward in network namespace "/var/run/netns/cni-c15e4e36-dad9-8316-c301-33af9dad5717": failed to dial 9000: dial tcp4 127.0.0.1:9000: connect: connection refused
I0227 08:07:50.913371 80029 portproxy.go:340] forward connection #0 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side
W0227 08:07:50.913487 80029 server.go:669] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
I0227 08:07:51.009519 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #1, 0 open
I0227 08:07:51.011912 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #1
I0227 08:07:51.011973 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #1
I0227 08:07:51.013677 80029 portproxy.go:322] forward connection #1 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream
I0227 08:07:51.013720 80029 portproxy.go:340] forward connection #1 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side
W0227 08:07:51.013794 80029 server.go:669] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
E0227 08:07:51.017026 80029 portproxy.go:242] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: an error occurred connecting to the remote port: error forwarding port 9000 to pod 66662ea1ab30b4193dac0102c49be840971d337c802cc0c8bbc074214522bd13, uid : failed to execute portforward in network namespace "/var/run/netns/cni-c15e4e36-dad9-8316-c301-33af9dad5717": failed to dial 9000: dial tcp4 127.0.0.1:9000: connect: connection refused
I0227 08:07:51.109515 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #2, 0 open
I0227 08:07:51.111479 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #2
I0227 08:07:51.111519 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #2
I0227 08:07:51.209519 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #3, 1 open
I0227 08:07:51.766305 80029 csi.go:377] gRPC call: {"Method":"/csi.v1.Identity/Probe","Request":{},"Response":{"ready":{"value":true}},"Error":"","FullError":null}
I0227 08:07:51.768304 80029 csi.go:377] gRPC call: {"Method":"/csi.v1.Identity/GetPluginInfo","Request":{},"Response":{"name":"csi-mock-csi-mock-volumes-4037","vendor_version":"0.3.0","manifest":{"url":"https://k8s.io/kubernetes/test/e2e/storage/drivers/csi-test/mock"}},"Error":"","FullError":null}
I0227 08:07:51.770494 80029 csi.go:377] gRPC call: {"Method":"/csi.v1.Identity/GetPluginCapabilities","Request":{},"Response":{"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"VolumeExpansion":{"type":1}}},{"Type":{"Service":{"type":2}}}]},"Error":"","FullError":null}
I0227 08:07:51.772899 80029 csi.go:377] gRPC call: {"Method":"/csi.v1.Controller/ControllerGetCapabilities","Request":{},"Response":{"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}},{"Type":{"Rpc":{"type":4}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":8}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":12}}},{"Type":{"Rpc":{"type":11}}},{"Type":{"Rpc":{"type":9}}}]},"Error":"","FullError":null}
I0227 08:08:21.209901 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred
I0227 08:08:21.209980 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #3, 1 open
I0227 08:08:51.211522 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating data stream: Timeout occurred
I0227 08:08:51.211566 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #3, 1 open
I0227 08:08:51.213451 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #3
I0227 08:08:51.213498 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #3
I0227 08:08:51.309540 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #4, 2 open
I0227 08:08:52.215358 80029 portproxy.go:322] forward connection #3 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream
I0227 08:08:52.215475 80029 portproxy.go:340] forward connection #3 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side
I0227 08:09:21.310003 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred
I0227 08:09:21.310086 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #4, 1 open
I0227 08:09:51.311854 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating data stream: Timeout occurred
I0227 08:09:51.311908 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #4, 1 open
I0227 08:09:51.314415 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #4
I0227 08:09:51.314497 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #4
I0227 08:09:51.409527 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #5, 2 open
I0227 08:09:52.326203 80029 portproxy.go:322] forward connection #4 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream
I0227 08:09:52.326277 80029 portproxy.go:340] forward connection #4 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side
I0227 08:10:21.409892 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred
I0227 08:10:21.409954 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #5, 1 open
I0227 08:10:51.411455 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating data stream: Timeout occurred
I0227 08:10:51.411557 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #5, 1 open
I0227 08:10:51.413229 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #5
I0227 08:10:51.413274 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #5
I0227 08:10:51.509508 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #6, 2 open
I0227 08:10:52.414862 80029 portproxy.go:322] forward connection #5 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream
I0227 08:10:52.414931 80029 portproxy.go:340] forward connection #5 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side
I0227 08:11:21.509879 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred
I0227 08:11:21.509934 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #6, 1 open
I0227 08:11:51.511519 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating data stream: Timeout occurred
I0227 08:11:51.511568 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #6, 1 open
I0227 08:11:51.513519 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #6
I0227 08:11:51.513571 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #6
I0227 08:11:51.609504 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #7, 2 open
I0227 08:11:52.517799 80029 portproxy.go:322] forward connection #6 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream
I0227 08:11:52.517918 80029 portproxy.go:340] forward connection #6 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side
I0227 08:12:21.609856 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred
I0227 08:12:21.609909 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #7, 1 open
I0227 08:12:51.611494 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating data stream: Timeout occurred
I0227 08:12:51.611555 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #7, 1 open
I0227 08:12:51.613289 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #7
I0227 08:12:51.613343 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #7
I0227 08:12:51.709535 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #8, 2 open
I0227 08:12:52.615858 80029 portproxy.go:322] forward connection #7 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream
I0227 08:12:52.615989 80029 portproxy.go:340] forward connection #7 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side
W0227 08:12:52.616116 80029 server.go:669] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
I0227 08:13:21.709934 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred
I0227 08:13:21.709997 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #8, 1 open
Feb 27 08:13:30.916: FAIL: Failed to register CSIDriver csi-mock-csi-mock-volumes-4037
Unexpected error:
<*errors.errorString | 0xc002666220>: {
s: "error waiting for CSI driver csi-mock-csi-mock-volumes-4037 registration on node kind-worker2: timed out waiting for the condition",
}
error waiting for CSI driver csi-mock-csi-mock-volumes-4037 registration on node kind-worker2: timed out waiting for the condition
occurred
p0lyn0mial
pushed a commit
that referenced
this pull request
Oct 23, 2023
These were found with a modified klog that enables "go vet" to check klog call
parameters:
cmd/kubeadm/app/features/features.go:149:4: printf: k8s.io/klog/v2.Warningf format %t has arg v of wrong type string (govet)
klog.Warningf("Setting deprecated feature gate %s=%t. It will be removed in a future release.", k, v)
test/images/sample-device-plugin/sampledeviceplugin.go:147:5: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
klog.Errorf("error: %w", err)
test/images/sample-device-plugin/sampledeviceplugin.go:155:3: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
klog.Errorf("Failed to add watch to %q: %w", triggerPath, err)
staging/src/k8s.io/code-generator/cmd/prerelease-lifecycle-gen/prerelease-lifecycle-generators/status.go:207:5: printf: k8s.io/klog/v2.Fatalf does not support error-wrapping directive %w (govet)
klog.Fatalf("Package %v: unsupported %s value: %q :%w", i, tagEnabledName, ptag.value, err)
staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:286:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet)
klog.V(4).Infof("Node %s missing in vSphere cloud provider cache, trying node informer")
staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:302:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet)
klog.V(4).Infof("Node %s missing in vSphere cloud provider caches, trying the API server")
p0lyn0mial
pushed a commit
that referenced
this pull request
Oct 3, 2025
Instead of creating a new test case, the permutation is passed down. This
enables adding the event numbers to the log output, which is useful to
understand better which output belongs to which input:
=== RUN TestListPatchedResourceSlices/update-patch/2_3_0_1
tracker.go:396: I0929 14:28:40.032318] event #1: ResourceSlice add slice="s1"
tracker.go:581: I0929 14:28:40.032404] event #1: syncing ResourceSlice resourceslice="s1"
tracker.go:659: I0929 14:28:40.032446] event #1: ResourceSlice synced resourceslice="s1" change="add"
tracker.go:396: I0929 14:28:40.032502] event #2: ResourceSlice add slice="s2"
tracker.go:581: I0929 14:28:40.032536] event #2: syncing ResourceSlice resourceslice="s2"
tracker.go:659: I0929 14:28:40.032568] event #2: ResourceSlice synced resourceslice="s2" change="add"
tracker.go:463: I0929 14:28:40.032609] event #0/#0: DeviceTaintRule add patch="rule"
tracker.go:581: I0929 14:28:40.032639] event #0/#0: syncing ResourceSlice resourceslice="s1"
tracker.go:703: I0929 14:28:40.032675] event #0/#0: processing DeviceTaintRule resourceslice="s1" deviceTaintRule="rule"
tracker.go:807: I0929 14:28:40.032712] event #0/#0: applying matching DeviceTaintRule resourceslice="s1" deviceTaintRule="rule" device="driver1.example.com/pool-1/device-1"
tracker.go:868: I0929 14:28:40.032780] event #0/#0: Assigned new taint ID, no matching taint resourceslice="s1" deviceTaintRule="rule" device="driver1.example.com/pool-1/device-1" taintID=0 taint="example.com/taint=tainted:NoExecute"
tracker.go:654: I0929 14:28:40.033023] event #0/#0: ResourceSlice synced resourceslice="s1" change="update" diff=<
@@ -23,7 +23,32 @@
"BindingConditions": null,
"BindingFailureConditions": null,
"AllowMultipleAllocations": null,
- "Taints": null
+ "Taints": [
+ {
+ "Rule": {
+ "metadata": {
+ "name": "rule"
+ },
+ "spec": {
+ "deviceSelector": {
+ "pool": "pool-1"
+ },
+ "taint": {
+ "key": "example.com/taint",
+ "value": "tainted",
+ "effect": "NoExecute",
+ "timeAdded": "2006-01-02T15:04:05Z"
+ }
+ },
+ "status": {}
+ },
+ "ID": 1,
+ "key": "example.com/taint",
+ "value": "tainted",
+ "effect": "NoExecute",
+ "timeAdded": "2006-01-02T15:04:05Z"
+ }
+ ]
}
],
"Taints": null,
>
tracker.go:482: I0929 14:28:40.033224] event #0/#1: DeviceTaintRule update patch="rule" diff=<
@@ -4,7 +4,7 @@
},
"spec": {
"deviceSelector": {
- "pool": "pool-1"
+ "pool": "pool-2"
},
"taint": {
"key": "example.com/taint",
>
tracker.go:581: I0929 14:28:40.033285] event #0/#1: syncing ResourceSlice resourceslice="s1"
tracker.go:703: I0929 14:28:40.033319] event #0/#1: processing DeviceTaintRule resourceslice="s1" deviceTaintRule="rule"
tracker.go:654: I0929 14:28:40.033478] event #0/#1: ResourceSlice synced resourceslice="s1" change="update" diff=<
@@ -23,32 +23,7 @@
"BindingConditions": null,
"BindingFailureConditions": null,
"AllowMultipleAllocations": null,
- "Taints": [
- {
- "Rule": {
- "metadata": {
- "name": "rule"
- },
- "spec": {
- "deviceSelector": {
- "pool": "pool-1"
- },
- "taint": {
- "key": "example.com/taint",
- "value": "tainted",
- "effect": "NoExecute",
- "timeAdded": "2006-01-02T15:04:05Z"
- }
- },
- "status": {}
- },
- "ID": 1,
- "key": "example.com/taint",
- "value": "tainted",
- "effect": "NoExecute",
- "timeAdded": "2006-01-02T15:04:05Z"
- }
- ]
+ "Taints": null
}
],
"Taints": null,
>
tracker.go:581: I0929 14:28:40.033601] event #0/#1: syncing ResourceSlice resourceslice="s2"
tracker.go:703: I0929 14:28:40.033633] event #0/#1: processing DeviceTaintRule resourceslice="s2" deviceTaintRule="rule"
...
Disabling event checking only worked when actually running all sub-tests. When
selectively running only one permutation with -run, the boolean variable was
wrong:
$ go test -run='.*/^update-patch$' ./staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/
ok k8s.io/dynamic-resource-allocation/resourceslice/tracker
$ go test -run='.*/^update-patch$/3_2_0_1' ./staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/
--- FAIL: TestListPatchedResourceSlices (0.01s)
--- FAIL: TestListPatchedResourceSlices/update-patch (0.00s)
--- FAIL: TestListPatchedResourceSlices/update-patch/3_2_0_1 (0.00s)
tracker_test.go:762:
Error Trace: /nvme/gopath/src/k8s.io/kubernetes/staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker_test.go:762
/nvme/gopath/src/k8s.io/kubernetes/staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker_test.go:856
Error: Not equal:
expected: []tracker.handlerEvent{tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000346000)}}
actual : []tracker.handlerEvent{tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc0001f9ba0)}, tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc000301d40), newObj:(*api.ResourceSlice)(0xc0003dba00)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc0003dba00), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc0001f9ba0), newObj:(*api.ResourceSlice)(0xc0003dbba0)}}
Now permutations are detected automatically based on the indices.
While at it, documentation gets moved around a bit to make reading test cases
easier without going to the implementation.
p0lyn0mial
pushed a commit
that referenced
this pull request
Jan 30, 2026
DRA depends on the assume cache having invoked all event handlers before
Assume() returns, because DRA maintains state that is relevant for scheduling
through those event handlers.
This log snippet shows how this went wrong during PreBind:
dynamicresources.go:1150: I0115 10:35:29.264437] scheduler: Claim stored in assume cache pod="testdra-all-usesallresources-kqjpj/my-pod-0091" claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 resourceVersion="5636"
dra_manager.go:198: I0115 10:35:29.264448] scheduler: Removed in-flight claim claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 version="287"
dynamicresources.go:1157: I0115 10:35:29.264463] scheduler: Removed claim from in-flight claims pod="testdra-all-usesallresources-kqjpj/my-pod-0091" claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 resourceVersion="5636" allocation=<
...
allocateddevices.go:189: I0115 10:35:29.267315] scheduler: Observed device allocation device="testdra-all-usesallresources-kqjpj.driver/worker-1/worker-1-device-096" claim="testdra-all-usesallresources-kqjpj/claim-0091"
- goroutine #1: UpdateStatus result delivered via informer.
AssumeCache updates cache, pushes event A, emitEvents pulls event A from queue.
*Not* done with delivering it yet!
- goroutine #2: AssumeCache.Assume called. Updates cache, pushes event B, emits it.
Old and new claim have allocation, so no "Observed device allocation".
- goroutine #3: Schedules next pod, without considering device as allocated (not in the log snippet).
- goroutine #1: Finally delivers event A: "Observed device allocation", but too late.
Also, events are delivered out-of-order.
The fix is to let emitEvents when called by Assume wait for a potentially
running emitEvents in some other goroutine, thus ensuring that an event pulled
out of the queue by that other goroutine got delivered before Assume itself
checks the queue one more time and then returns.
The time window were things go wrong is small. An E2E test covering this only
flaked rarely, and only in the CI. An integration test (separate commit) with
higher number of pods finally made it possible to reproduce locally. It also
uncovered a second race (fix in separate commit).
The unit test fails without the fix:
=== RUN TestAssumeConcurrency
assume_cache_test.go:311: FATAL ERROR:
Assume should have blocked and didn't.
--- FAIL: TestAssumeConcurrency (0.00s)
p0lyn0mial
pushed a commit
that referenced
this pull request
Jan 30, 2026
GatherAllocatedState and ListAllAllocatedDevices need to collect information
from different sources (allocated devices, in-flight claims), potentially even
multiple times (GatherAllocatedState first gets allocated devices, then the
capacities).
The underlying assumption that nothing bad happens in parallel is not always
true. The following log snippet shows how an update of the assume
cache (feeding the allocated devices tracker) and in-flight claims lands such
that GatherAllocatedState doesn't see the device in that claim as allocated:
dra_manager.go:263: I0115 15:11:04.407714 18778] scheduler: Starting GatherAllocatedState
...
allocateddevices.go:189: I0115 15:11:04.407945 18066] scheduler: Observed device allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-094" claim="testdra-all-usesallresources-hvs5d/claim-0553"
dynamicresources.go:1150: I0115 15:11:04.407981 89109] scheduler: Claim stored in assume cache pod="testdra-all-usesallresources-hvs5d/my-pod-0553" claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 resourceVersion="5680"
dra_manager.go:201: I0115 15:11:04.408008 89109] scheduler: Removed in-flight claim claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 version="1211"
dynamicresources.go:1157: I0115 15:11:04.408044 89109] scheduler: Removed claim from in-flight claims pod="testdra-all-usesallresources-hvs5d/my-pod-0553" claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 resourceVersion="5680" allocation=<
{
"devices": {
"results": [
{
"request": "req-1",
"driver": "testdra-all-usesallresources-hvs5d.driver",
"pool": "worker-5",
"device": "worker-5-device-094"
}
]
},
"nodeSelector": {
"nodeSelectorTerms": [
{
"matchFields": [
{
"key": "metadata.name",
"operator": "In",
"values": [
"worker-5"
]
}
]
}
]
},
"allocationTimestamp": "2026-01-15T14:11:04Z"
}
>
dra_manager.go:280: I0115 15:11:04.408085 18778] scheduler: Device is in flight for allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-095" claim="testdra-all-usesallresources-hvs5d/claim-0086"
dra_manager.go:280: I0115 15:11:04.408137 18778] scheduler: Device is in flight for allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-096" claim="testdra-all-usesallresources-hvs5d/claim-0165"
default_binder.go:69: I0115 15:11:04.408175 89109] scheduler: Attempting to bind pod to node pod="testdra-all-usesallresources-hvs5d/my-pod-0553" node="worker-5"
dra_manager.go:265: I0115 15:11:04.408264 18778] scheduler: Finished GatherAllocatedState allocatedDevices=<map[string]interface {} | len:2>: {
Initial state: "worker-5-device-094" is in-flight, not in cache
- goroutine #1: starts GatherAllocatedState, copies cache
- goroutine #2: adds to assume cache, removes from in-flight
- goroutine #1: checks in-flight
=> device never seen as allocated
This is the second reason for double allocation of the same device in two
different claims. The other was timing in the assume cache. Both were
tracked down with an integration test (separate commit). It did not fail
all the time, but enough that regressions should show up as flakes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.