You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
lib/resourcebuilder: Replace wait-for with single-shot "is it alive now?"
We've had 'if updated' guards around waitFor*Completion since the
library landed in 2d334c2 (lib: add resource builder that allows Do
on any lib.Manifest, 2018-08-20, #10). But, only waiting when
'updated' is true is a weak block, because if/when we fail to
complete, Task.Run will back-off and call builder.Apply again. That
new Apply will see the already-updated object, set 'updated' false,
and not wait. So whether we block or not is orthogonal to 'updated';
nobody cares about whether the most recent update happened in this
builder.Apply, this sync cycle, or a previous cycle.
We don't even care all that much about whether the Deployment,
DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback
is going to come from the ClusterOperator, so with this commit we
continue past the resource wait-for unless the resource is really
hurting, in which case we fail immediately (inside builder.Apply,
Task.Run will still hit us a few times) to bubble that up. In
situations where we don't see anything too terrible going on, we'll
continue on past and later block on ClusterOperator not being ready.
The "unknown state" Deployment logging has changed a bit. I'd
initially dropped it, but Jack suggested keeping it to make
identifying broken-Deployment-controller and similar situations easier
[1]. Previously it was logged when we weren't happy with
updatedReplicas and unavailableReplicas, nothing obviously bad was
happening, and we were not Progressing=True. We no longer check
updatedReplicas or unavailableReplicas, so now it's just "nothing
obviously bad is happening, but that may just be because the
Deployment controller isn't giving us any of the oconditions we look
at to judge badness". It's possible that we should also check for
"when we do have those conditions, the values are either True or
False, not some unexpected key". But I'm leaving that alone for now.
There's no object status for CRDs or DaemonSets that marks "we are
really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers
do not set any conditions in their operand status (although the API
for those conditions exists [2,3]). With this commit, we have very
minimal wait logic for either. Sufficiently unhealthy DaemonSet
should be reported on via their associated ClusterOperator, and
sufficiently unhealthy CRD should be reported on when we fail to push
any custom resources consuming them (Task.Run retries will give the
API server time to ready itself after accepting a CRD update before
the CVO fails its sync cycle).
We still need the public WaitForJobCompletion, because
fetchUpdatePayloadToDir uses it to wait on the release download.
Also expand "iff" -> "if and only if" while I'm touching that line, at
Jack's suggestion [4].
[1]: #400 (comment)
[2]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590
[3]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320
[4]: #400 (comment)
Nested: fmt.Errorf("deployment %s is not available; updated replicas=%d of %d, available replicas=%d of %d", iden, d.Status.UpdatedReplicas, d.Status.Replicas, d.Status.AvailableReplicas, d.Status.Replicas),
182
-
Reason: "WorkloadNotAvailable",
183
-
Message: fmt.Sprintf("deployment %s is not available %s: %s", iden, availableCondition.Reason, availableCondition.Message),
Nested: fmt.Errorf("deployment %s is not progressing; updated replicas=%d of %d, available replicas=%d of %d", iden, d.Status.UpdatedReplicas, d.Status.Replicas, d.Status.AvailableReplicas, d.Status.Replicas),
192
-
Reason: "WorkloadNotAvailable",
193
-
Message: fmt.Sprintf("deployment %s is not progressing %s: %s", iden, progressingCondition.Reason, progressingCondition.Message),
Nested: fmt.Errorf("deployment %s is not available; updated replicas=%d of %d, available replicas=%d of %d", iden, d.Status.UpdatedReplicas, d.Status.Replicas, d.Status.AvailableReplicas, d.Status.Replicas),
162
+
Reason: "WorkloadNotAvailable",
163
+
Message: fmt.Sprintf("deployment %s is not available %s: %s", iden, availableCondition.Reason, availableCondition.Message),
Nested: fmt.Errorf("deployment %s is not progressing; updated replicas=%d of %d, available replicas=%d of %d", iden, d.Status.UpdatedReplicas, d.Status.Replicas, d.Status.AvailableReplicas, d.Status.Replicas),
171
+
Reason: "WorkloadNotAvailable",
172
+
Message: fmt.Sprintf("deployment %s is not progressing %s: %s", iden, progressingCondition.Reason, progressingCondition.Message),
173
+
Name: iden,
202
174
}
175
+
}
203
176
204
-
klog.Errorf("deployment %s is in unknown state", iden)
0 commit comments