Unify internal and external informer factories by sttts · Pull Request #1 · p0lyn0mial/kubernetes

sttts · 2017-05-08T12:32:09Z

No description provided.

…the server configuration.

deads2k · 2017-05-08T13:18:32Z

@p0lyn0mial I spoke with @sttts in IRC. We've agreed that we can proceed with kubernetes#45355 without this pull in your branch.

One of the concerns is the silent ordering requirement that we're introducing. Some of it may be ameliorated by leveraging RecommendedOptions to always apply in the correct order, but the concern is still open even though we're proceeding without resolving it.

p0lyn0mial · 2017-05-08T16:40:17Z

thanks for letting me know.

…mance Automatic merge from submit-queue (batch tested with PRs 38505, 41785, 46315) Only retrieve relevant volumes **What this PR does / why we need it**: Improves performance for Cinder volume attach/detach calls. Currently when Cinder volumes are attached or detached, functions try to retrieve details about the volume from the Nova API. Because some only have the volume name not its UUID, they use the list function in gophercloud to iterate over all volumes to find a match. This incurs severe performance problems on OpenStack projects with lots of volumes (sometimes thousands) since it needs to send a new request when the current page does not contain a match. A better way of doing this is use the `?name=XXX` query parameter to refine the results. **Which issue this PR fixes**: kubernetes#26404 **Special notes for your reviewer**: There were 2 ways of addressing this problem: 1. Use the `name` query parameter 2. Instead of using the list function, switch to using volume UUIDs and use the GET function instead. You'd need to change the signature of a few functions though, such as [`DeleteVolume`](https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/cinder/cinder.go#L49), so I'm not sure how backwards compatible that is. Since #1 does effectively the same as #2, I went with it because it ensures BC. One assumption that is made is that the `volumeName` being retrieved matches exactly the name of the volume in Cinder. I'm not sure how accurate that is, but I see no reason why cloud providers would want to append/prefix things arbitrarily. **Release note**: ```release-note Improves performance of Cinder volume attach/detach operations ```

Automatic merge from submit-queue (batch tested with PRs 47523, 47438, 47550, 47450, 47612) Move slow PV test to slow suite. See [testgrid](https://k8s-testgrid.appspot.com/google-gce#gce&width=5&graph-metrics=test-duration-minutes). #1

Fix issue kubernetes#68338

update from kubernetes master

Sync changes

Sharing the same connection for multiple streams should have worked, but ran into unexpected timeouts: I0227 08:07:49.754263 80029 portproxy.go:109] container "mock" in pod csi-mock-volumes-4037-2061/csi-mockplugin-0 is running E0227 08:07:49.779359 80029 portproxy.go:178] prepare forwarding csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: dialer failed: unable to upgrade connection: pod not found ("csi-mockplugin-0_csi-mock-volumes-4037-2061") I0227 08:07:50.782705 80029 portproxy.go:109] container "mock" in pod csi-mock-volumes-4037-2061/csi-mockplugin-0 is running I0227 08:07:50.809326 80029 portproxy.go:125] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: starting connection polling I0227 08:07:50.909544 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #0, 0 open I0227 08:07:50.912436 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #0 I0227 08:07:50.912503 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #0 I0227 08:07:50.913161 80029 portproxy.go:322] forward connection #0 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream E0227 08:07:50.913324 80029 portproxy.go:242] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: an error occurred connecting to the remote port: error forwarding port 9000 to pod 66662ea1ab30b4193dac0102c49be840971d337c802cc0c8bbc074214522bd13, uid : failed to execute portforward in network namespace "/var/run/netns/cni-c15e4e36-dad9-8316-c301-33af9dad5717": failed to dial 9000: dial tcp4 127.0.0.1:9000: connect: connection refused I0227 08:07:50.913371 80029 portproxy.go:340] forward connection #0 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side W0227 08:07:50.913487 80029 server.go:669] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF" I0227 08:07:51.009519 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #1, 0 open I0227 08:07:51.011912 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #1 I0227 08:07:51.011973 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #1 I0227 08:07:51.013677 80029 portproxy.go:322] forward connection #1 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream I0227 08:07:51.013720 80029 portproxy.go:340] forward connection #1 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side W0227 08:07:51.013794 80029 server.go:669] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF" E0227 08:07:51.017026 80029 portproxy.go:242] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: an error occurred connecting to the remote port: error forwarding port 9000 to pod 66662ea1ab30b4193dac0102c49be840971d337c802cc0c8bbc074214522bd13, uid : failed to execute portforward in network namespace "/var/run/netns/cni-c15e4e36-dad9-8316-c301-33af9dad5717": failed to dial 9000: dial tcp4 127.0.0.1:9000: connect: connection refused I0227 08:07:51.109515 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #2, 0 open I0227 08:07:51.111479 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #2 I0227 08:07:51.111519 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #2 I0227 08:07:51.209519 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #3, 1 open I0227 08:07:51.766305 80029 csi.go:377] gRPC call: {"Method":"/csi.v1.Identity/Probe","Request":{},"Response":{"ready":{"value":true}},"Error":"","FullError":null} I0227 08:07:51.768304 80029 csi.go:377] gRPC call: {"Method":"/csi.v1.Identity/GetPluginInfo","Request":{},"Response":{"name":"csi-mock-csi-mock-volumes-4037","vendor_version":"0.3.0","manifest":{"url":"https://k8s.io/kubernetes/test/e2e/storage/drivers/csi-test/mock"}},"Error":"","FullError":null} I0227 08:07:51.770494 80029 csi.go:377] gRPC call: {"Method":"/csi.v1.Identity/GetPluginCapabilities","Request":{},"Response":{"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"VolumeExpansion":{"type":1}}},{"Type":{"Service":{"type":2}}}]},"Error":"","FullError":null} I0227 08:07:51.772899 80029 csi.go:377] gRPC call: {"Method":"/csi.v1.Controller/ControllerGetCapabilities","Request":{},"Response":{"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}},{"Type":{"Rpc":{"type":4}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":8}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":12}}},{"Type":{"Rpc":{"type":11}}},{"Type":{"Rpc":{"type":9}}}]},"Error":"","FullError":null} I0227 08:08:21.209901 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred I0227 08:08:21.209980 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #3, 1 open I0227 08:08:51.211522 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating data stream: Timeout occurred I0227 08:08:51.211566 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #3, 1 open I0227 08:08:51.213451 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #3 I0227 08:08:51.213498 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #3 I0227 08:08:51.309540 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #4, 2 open I0227 08:08:52.215358 80029 portproxy.go:322] forward connection #3 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream I0227 08:08:52.215475 80029 portproxy.go:340] forward connection #3 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side I0227 08:09:21.310003 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred I0227 08:09:21.310086 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #4, 1 open I0227 08:09:51.311854 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating data stream: Timeout occurred I0227 08:09:51.311908 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #4, 1 open I0227 08:09:51.314415 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #4 I0227 08:09:51.314497 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #4 I0227 08:09:51.409527 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #5, 2 open I0227 08:09:52.326203 80029 portproxy.go:322] forward connection #4 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream I0227 08:09:52.326277 80029 portproxy.go:340] forward connection #4 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side I0227 08:10:21.409892 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred I0227 08:10:21.409954 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #5, 1 open I0227 08:10:51.411455 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating data stream: Timeout occurred I0227 08:10:51.411557 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #5, 1 open I0227 08:10:51.413229 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #5 I0227 08:10:51.413274 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #5 I0227 08:10:51.509508 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #6, 2 open I0227 08:10:52.414862 80029 portproxy.go:322] forward connection #5 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream I0227 08:10:52.414931 80029 portproxy.go:340] forward connection #5 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side I0227 08:11:21.509879 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred I0227 08:11:21.509934 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #6, 1 open I0227 08:11:51.511519 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating data stream: Timeout occurred I0227 08:11:51.511568 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #6, 1 open I0227 08:11:51.513519 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #6 I0227 08:11:51.513571 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #6 I0227 08:11:51.609504 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #7, 2 open I0227 08:11:52.517799 80029 portproxy.go:322] forward connection #6 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream I0227 08:11:52.517918 80029 portproxy.go:340] forward connection #6 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side I0227 08:12:21.609856 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred I0227 08:12:21.609909 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #7, 1 open I0227 08:12:51.611494 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating data stream: Timeout occurred I0227 08:12:51.611555 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #7, 1 open I0227 08:12:51.613289 80029 portproxy.go:155] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: created a new connection #7 I0227 08:12:51.613343 80029 portproxy.go:286] forward listener for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: got a new connection #7 I0227 08:12:51.709535 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #8, 2 open I0227 08:12:52.615858 80029 portproxy.go:322] forward connection #7 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: remote side closed the stream I0227 08:12:52.615989 80029 portproxy.go:340] forward connection #7 for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: closing our side W0227 08:12:52.616116 80029 server.go:669] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF" I0227 08:13:21.709934 80029 portproxy.go:151] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: no connection: error creating error stream: Timeout occurred I0227 08:13:21.709997 80029 portproxy.go:148] port forwarding for csi-mock-volumes-4037-2061/csi-mockplugin-0:9000: trying to create a new connection #8, 1 open Feb 27 08:13:30.916: FAIL: Failed to register CSIDriver csi-mock-csi-mock-volumes-4037 Unexpected error: <*errors.errorString | 0xc002666220>: { s: "error waiting for CSI driver csi-mock-csi-mock-volumes-4037 registration on node kind-worker2: timed out waiting for the condition", } error waiting for CSI driver csi-mock-csi-mock-volumes-4037 registration on node kind-worker2: timed out waiting for the condition occurred

These were found with a modified klog that enables "go vet" to check klog call parameters: cmd/kubeadm/app/features/features.go:149:4: printf: k8s.io/klog/v2.Warningf format %t has arg v of wrong type string (govet) klog.Warningf("Setting deprecated feature gate %s=%t. It will be removed in a future release.", k, v) test/images/sample-device-plugin/sampledeviceplugin.go:147:5: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet) klog.Errorf("error: %w", err) test/images/sample-device-plugin/sampledeviceplugin.go:155:3: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet) klog.Errorf("Failed to add watch to %q: %w", triggerPath, err) staging/src/k8s.io/code-generator/cmd/prerelease-lifecycle-gen/prerelease-lifecycle-generators/status.go:207:5: printf: k8s.io/klog/v2.Fatalf does not support error-wrapping directive %w (govet) klog.Fatalf("Package %v: unsupported %s value: %q :%w", i, tagEnabledName, ptag.value, err) staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:286:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet) klog.V(4).Infof("Node %s missing in vSphere cloud provider cache, trying node informer") staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:302:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet) klog.V(4).Infof("Node %s missing in vSphere cloud provider caches, trying the API server")

Instead of creating a new test case, the permutation is passed down. This enables adding the event numbers to the log output, which is useful to understand better which output belongs to which input: === RUN TestListPatchedResourceSlices/update-patch/2_3_0_1 tracker.go:396: I0929 14:28:40.032318] event #1: ResourceSlice add slice="s1" tracker.go:581: I0929 14:28:40.032404] event #1: syncing ResourceSlice resourceslice="s1" tracker.go:659: I0929 14:28:40.032446] event #1: ResourceSlice synced resourceslice="s1" change="add" tracker.go:396: I0929 14:28:40.032502] event #2: ResourceSlice add slice="s2" tracker.go:581: I0929 14:28:40.032536] event #2: syncing ResourceSlice resourceslice="s2" tracker.go:659: I0929 14:28:40.032568] event #2: ResourceSlice synced resourceslice="s2" change="add" tracker.go:463: I0929 14:28:40.032609] event #0/#0: DeviceTaintRule add patch="rule" tracker.go:581: I0929 14:28:40.032639] event #0/#0: syncing ResourceSlice resourceslice="s1" tracker.go:703: I0929 14:28:40.032675] event #0/#0: processing DeviceTaintRule resourceslice="s1" deviceTaintRule="rule" tracker.go:807: I0929 14:28:40.032712] event #0/#0: applying matching DeviceTaintRule resourceslice="s1" deviceTaintRule="rule" device="driver1.example.com/pool-1/device-1" tracker.go:868: I0929 14:28:40.032780] event #0/#0: Assigned new taint ID, no matching taint resourceslice="s1" deviceTaintRule="rule" device="driver1.example.com/pool-1/device-1" taintID=0 taint="example.com/taint=tainted:NoExecute" tracker.go:654: I0929 14:28:40.033023] event #0/#0: ResourceSlice synced resourceslice="s1" change="update" diff=< @@ -23,7 +23,32 @@ "BindingConditions": null, "BindingFailureConditions": null, "AllowMultipleAllocations": null, - "Taints": null + "Taints": [ + { + "Rule": { + "metadata": { + "name": "rule" + }, + "spec": { + "deviceSelector": { + "pool": "pool-1" + }, + "taint": { + "key": "example.com/taint", + "value": "tainted", + "effect": "NoExecute", + "timeAdded": "2006-01-02T15:04:05Z" + } + }, + "status": {} + }, + "ID": 1, + "key": "example.com/taint", + "value": "tainted", + "effect": "NoExecute", + "timeAdded": "2006-01-02T15:04:05Z" + } + ] } ], "Taints": null, > tracker.go:482: I0929 14:28:40.033224] event #0/#1: DeviceTaintRule update patch="rule" diff=< @@ -4,7 +4,7 @@ }, "spec": { "deviceSelector": { - "pool": "pool-1" + "pool": "pool-2" }, "taint": { "key": "example.com/taint", > tracker.go:581: I0929 14:28:40.033285] event #0/#1: syncing ResourceSlice resourceslice="s1" tracker.go:703: I0929 14:28:40.033319] event #0/#1: processing DeviceTaintRule resourceslice="s1" deviceTaintRule="rule" tracker.go:654: I0929 14:28:40.033478] event #0/#1: ResourceSlice synced resourceslice="s1" change="update" diff=< @@ -23,32 +23,7 @@ "BindingConditions": null, "BindingFailureConditions": null, "AllowMultipleAllocations": null, - "Taints": [ - { - "Rule": { - "metadata": { - "name": "rule" - }, - "spec": { - "deviceSelector": { - "pool": "pool-1" - }, - "taint": { - "key": "example.com/taint", - "value": "tainted", - "effect": "NoExecute", - "timeAdded": "2006-01-02T15:04:05Z" - } - }, - "status": {} - }, - "ID": 1, - "key": "example.com/taint", - "value": "tainted", - "effect": "NoExecute", - "timeAdded": "2006-01-02T15:04:05Z" - } - ] + "Taints": null } ], "Taints": null, > tracker.go:581: I0929 14:28:40.033601] event #0/#1: syncing ResourceSlice resourceslice="s2" tracker.go:703: I0929 14:28:40.033633] event #0/#1: processing DeviceTaintRule resourceslice="s2" deviceTaintRule="rule" ... Disabling event checking only worked when actually running all sub-tests. When selectively running only one permutation with -run, the boolean variable was wrong: $ go test -run='.*/^update-patch$' ./staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/ ok k8s.io/dynamic-resource-allocation/resourceslice/tracker $ go test -run='.*/^update-patch$/3_2_0_1' ./staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/ --- FAIL: TestListPatchedResourceSlices (0.01s) --- FAIL: TestListPatchedResourceSlices/update-patch (0.00s) --- FAIL: TestListPatchedResourceSlices/update-patch/3_2_0_1 (0.00s) tracker_test.go:762: Error Trace: /nvme/gopath/src/k8s.io/kubernetes/staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker_test.go:762 /nvme/gopath/src/k8s.io/kubernetes/staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker_test.go:856 Error: Not equal: expected: []tracker.handlerEvent{tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000346000)}} actual : []tracker.handlerEvent{tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc0001f9ba0)}, tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc000301d40), newObj:(*api.ResourceSlice)(0xc0003dba00)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc0003dba00), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc0001f9ba0), newObj:(*api.ResourceSlice)(0xc0003dbba0)}} Now permutations are detected automatically based on the indices. While at it, documentation gets moved around a bit to make reading test cases easier without going to the implementation.

DRA depends on the assume cache having invoked all event handlers before Assume() returns, because DRA maintains state that is relevant for scheduling through those event handlers. This log snippet shows how this went wrong during PreBind: dynamicresources.go:1150: I0115 10:35:29.264437] scheduler: Claim stored in assume cache pod="testdra-all-usesallresources-kqjpj/my-pod-0091" claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 resourceVersion="5636" dra_manager.go:198: I0115 10:35:29.264448] scheduler: Removed in-flight claim claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 version="287" dynamicresources.go:1157: I0115 10:35:29.264463] scheduler: Removed claim from in-flight claims pod="testdra-all-usesallresources-kqjpj/my-pod-0091" claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 resourceVersion="5636" allocation=< ... allocateddevices.go:189: I0115 10:35:29.267315] scheduler: Observed device allocation device="testdra-all-usesallresources-kqjpj.driver/worker-1/worker-1-device-096" claim="testdra-all-usesallresources-kqjpj/claim-0091" - goroutine #1: UpdateStatus result delivered via informer. AssumeCache updates cache, pushes event A, emitEvents pulls event A from queue. *Not* done with delivering it yet! - goroutine #2: AssumeCache.Assume called. Updates cache, pushes event B, emits it. Old and new claim have allocation, so no "Observed device allocation". - goroutine #3: Schedules next pod, without considering device as allocated (not in the log snippet). - goroutine #1: Finally delivers event A: "Observed device allocation", but too late. Also, events are delivered out-of-order. The fix is to let emitEvents when called by Assume wait for a potentially running emitEvents in some other goroutine, thus ensuring that an event pulled out of the queue by that other goroutine got delivered before Assume itself checks the queue one more time and then returns. The time window were things go wrong is small. An E2E test covering this only flaked rarely, and only in the CI. An integration test (separate commit) with higher number of pods finally made it possible to reproduce locally. It also uncovered a second race (fix in separate commit). The unit test fails without the fix: === RUN TestAssumeConcurrency assume_cache_test.go:311: FATAL ERROR: Assume should have blocked and didn't. --- FAIL: TestAssumeConcurrency (0.00s)

GatherAllocatedState and ListAllAllocatedDevices need to collect information from different sources (allocated devices, in-flight claims), potentially even multiple times (GatherAllocatedState first gets allocated devices, then the capacities). The underlying assumption that nothing bad happens in parallel is not always true. The following log snippet shows how an update of the assume cache (feeding the allocated devices tracker) and in-flight claims lands such that GatherAllocatedState doesn't see the device in that claim as allocated: dra_manager.go:263: I0115 15:11:04.407714 18778] scheduler: Starting GatherAllocatedState ... allocateddevices.go:189: I0115 15:11:04.407945 18066] scheduler: Observed device allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-094" claim="testdra-all-usesallresources-hvs5d/claim-0553" dynamicresources.go:1150: I0115 15:11:04.407981 89109] scheduler: Claim stored in assume cache pod="testdra-all-usesallresources-hvs5d/my-pod-0553" claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 resourceVersion="5680" dra_manager.go:201: I0115 15:11:04.408008 89109] scheduler: Removed in-flight claim claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 version="1211" dynamicresources.go:1157: I0115 15:11:04.408044 89109] scheduler: Removed claim from in-flight claims pod="testdra-all-usesallresources-hvs5d/my-pod-0553" claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 resourceVersion="5680" allocation=< { "devices": { "results": [ { "request": "req-1", "driver": "testdra-all-usesallresources-hvs5d.driver", "pool": "worker-5", "device": "worker-5-device-094" } ] }, "nodeSelector": { "nodeSelectorTerms": [ { "matchFields": [ { "key": "metadata.name", "operator": "In", "values": [ "worker-5" ] } ] } ] }, "allocationTimestamp": "2026-01-15T14:11:04Z" } > dra_manager.go:280: I0115 15:11:04.408085 18778] scheduler: Device is in flight for allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-095" claim="testdra-all-usesallresources-hvs5d/claim-0086" dra_manager.go:280: I0115 15:11:04.408137 18778] scheduler: Device is in flight for allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-096" claim="testdra-all-usesallresources-hvs5d/claim-0165" default_binder.go:69: I0115 15:11:04.408175 89109] scheduler: Attempting to bind pod to node pod="testdra-all-usesallresources-hvs5d/my-pod-0553" node="worker-5" dra_manager.go:265: I0115 15:11:04.408264 18778] scheduler: Finished GatherAllocatedState allocatedDevices=<map[string]interface {} | len:2>: { Initial state: "worker-5-device-094" is in-flight, not in cache - goroutine #1: starts GatherAllocatedState, copies cache - goroutine #2: adds to assume cache, removes from in-flight - goroutine #1: checks in-flight => device never seen as allocated This is the second reason for double allocation of the same device in two different claims. The other was timing in the assume cache. Both were tracked down with an integration test (separate commit). It did not fail all the time, but enough that regressions should show up as flakes.

p0lyn0mial and others added 7 commits May 1, 2017 15:07

refactor admission options

1b39b3e

Implemented AdmissionOptions.ApplyTo which adds admission control to …

4c13d31

…the server configuration.

passing shared informers to admission options as a dependency

9bc0ea1

making ApplyTo variadic

58309bc

changing the name to PluginNames

319f1d3

Unify internal and external informers in apiserver

52ded61

Update bazel

86ba29d

sttts mentioned this pull request May 8, 2017

Admission options spits out admission control kubernetes/kubernetes#45355

Merged

p0lyn0mial force-pushed the admission_options_spits_out_admission_control branch 2 times, most recently from 4818f16 to 8cea69a Compare May 14, 2017 09:58

p0lyn0mial pushed a commit that referenced this pull request Mar 8, 2018

Review #1

3b178ee

p0lyn0mial pushed a commit that referenced this pull request Mar 8, 2018

Review #1

aac6e3f

p0lyn0mial pushed a commit that referenced this pull request Oct 1, 2018

Merge pull request #1 from Nordix/issue-68338

edd4fc2

Fix issue kubernetes#68338

p0lyn0mial pushed a commit that referenced this pull request Jan 22, 2019

Merge pull request #1 from kubernetes/master

4c13c35

update from kubernetes master

p0lyn0mial pushed a commit that referenced this pull request Apr 24, 2020

Merge pull request #1 from kubernetes/master

d68e3f8

Sync changes

p0lyn0mial pushed a commit that referenced this pull request Dec 11, 2020

Fixup #1 addressing review comments

707073d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify internal and external informer factories#1

Unify internal and external informer factories#1
sttts wants to merge 7 commits intop0lyn0mial:admission_options_spits_out_admission_controlfrom
sttts:sttts-unify-informers

sttts commented May 8, 2017

Uh oh!

deads2k commented May 8, 2017

Uh oh!

p0lyn0mial commented May 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sttts commented May 8, 2017

Uh oh!

deads2k commented May 8, 2017

Uh oh!

p0lyn0mial commented May 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants