Skip to content

The scheduler may start serving before the resource state synchronization is completed #1495

@DSFans2014

Description

@DSFans2014

The events registered in the scheduler's start function return and start scheduling before the callbacks of event are completed. This may cause some resources and status to be incompletely initialized, leading to scheduling problems. For example, when there are a large number of Pods, it is possible that the status of the quotaManager may not be synchronized.

informerFactory.Core().V1().Pods().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: s.onAddPod,
UpdateFunc: s.onUpdatePod,
DeleteFunc: s.onDelPod,
})
informerFactory.Core().V1().Nodes().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: func(_ any) { s.doNodeNotify() },
DeleteFunc: s.onDelNode,
})
informerFactory.Core().V1().ResourceQuotas().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: s.onAddQuota,
UpdateFunc: s.onUpdateQuota,
DeleteFunc: s.onDelQuota,
})
informerFactory.Start(s.stopCh)
informerFactory.WaitForCacheSync(s.stopCh)

In the WaitForCacheSync function, for each informer, cache.WaitForCacheSync(stopCh, informer.HasSynced) is called. This means it waits until the HasSynced method of each informer returns true before returning.

for informType, informer := range informers {
	res[informType] = cache.WaitForCacheSync(stopCh, informer.HasSynced)
}

In the function declaration of HasSynced in SharedInformer interface, we can see the following description:

// HasSynced returns true if the shared informer's store has been
// informed by at least one full LIST of the authoritative state
// of the informer's object collection.  This is unrelated to "resync".
//
// Note that this doesn't tell you if an individual handler is synced!!
// For that, please call HasSynced on the handle returned by
// AddEventHandler.
HasSynced() bool

https://github.com/kubernetes/client-go/blob/1bb1ad283de66456c2557dea53d05dcf44b39f50/tools/cache/shared_informer.go#L188-L195

This means that the completion of the WaitForCacheSync function does not guarantee that all event callbacks have finished.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions