You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nfd-master: tweak list options for NodeFeature informer
Fix cache syncing problems on big clusters with thousands of NodeFeature
objects.
On the initial list (sync) the client-go cache reflector sets the
ResourceVersion to "0" (instead of leaving it empty). This causes
problems in the api server with (apiserver) logs like:
E writers.go:122] apiserver was unable to write a JSON response: http:
Handler timeout
E status.go:71] apiserver received an error that is not an
metav1.Status: &errors.errorString{s:"http: Handler timeout"}:
http: Handler timeout
On the nfd-master side we see corresponding log snippets like:
W reflector.go:547] failed to list *v1alpha1.NodeFeature: stream error
when reading response body, may be caused by closed
connection. Please retry. Original error: stream
error: stream ID 1521; INTERNAL_ERROR; received from
peer
I trace.go:236] "Reflector ListAndWatch" name:*** (***) (total time:
61126ms): ---"Objects listed" error:stream error when
reading response body, may be caused by closed
connection. Please retry. Original error: stream
error: stream ID 1521; INTERNAL_ERROR; received from
peer 61126ms (***)
Decreasing the page size (opts.Limits) does not have any effect on the
timeouts. However, setting ResourceVersion to an empty value seems to
get the paging on its tracks, eliminating the timeouts.
TODO: investigate in Kubernetes upstream the root cause of the timeouts
with ResourceVersion="0".
0 commit comments