You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pkg/cvo/availableupdates: Preserve update advice on update-service failures
RemoteFailed, ResponseFailed, and ResponseInvalid reasons are all
server-side issues. This commit makes clusters more resilient to
OpenShift Update Service (OSUS) issues by preserving the cache of
previously-retrieved advice for up to 24 hours, while we wait for OSUS
to recover (or proxies or other network configuration between the
cluster and its OSUS to be fixed). OSUS advice does not change often,
and the only risk of acting on stale advice is that you might not hear
about recently-declared Conditional Update risks [1]. The
RetrievedUpdates condition should be displayed in the update-selection
user interfaces, and the CannotRetrieveUpdates alert will be firing,
so cluster administrators will be aware of the risk of stale data, and
can decide whether to wait for OSUS to recover, or to initiate an
update based on the stale information (which they can supplement with
additional checks like [any new risks declared in graph-data
recently? [2]).
At the moment, restarting the cluster-version operator will also clear
the cache. We could reload it from ClusterVersion status, but I'm
deferring that for future work.
[1]: https://docs.openshift.com/container-platform/4.17/updating/understanding_updates/understanding-update-channels-release.html#conditional-updates-overview_understanding-update-channels-releases
[2]: https://github.com/openshift/cincinnati-graph-data/commits/master/blocked-edges
klog.V(2).Infof("Retrieving available updates again, because more than %s has elapsed since %s", optr.minimumUpdateCheckInterval, optrAvailableUpdates.LastAttempt.Format(time.RFC3339))
71
71
} elseifchannel!=optrAvailableUpdates.Channel {
72
72
klog.V(2).Infof("Retrieving available updates again, because the channel has changed from %q to %q", optrAvailableUpdates.Channel, channel)
klog.V(2).Infof("Retrieving available updates again, because the architecture has changed from %q to %q", optrAvailableUpdates.Architecture, desiredArch)
klog.V(2).Infof("Retrieving available updates again, because more than %s has elapsed since last change at %s. Will clear the cache if this fails.", maximumCacheInterval, optrAvailableUpdates.LastAttempt.Format(time.RFC3339))
klog.V(2).Infof("Retrieving available updates again, because more than %s has elapsed since last attempt at %s", optr.minimumUpdateCheckInterval, optrAvailableUpdates.LastAttempt.Format(time.RFC3339))
0 commit comments