-
Notifications
You must be signed in to change notification settings - Fork 141
Description
This issue is filed against crossplane/crossplane-runtime because the root constraint exists here, not in upjet or any provider.
The managed reconciler in pkg/reconciler/managed unconditionally calls status.MarkConditions(xpv1.ReconcileSuccess()) whenever Observe() returns ResourceUpToDate: true with no error. There is currently no way for an ExternalClient to signal "I am deliberately returning ResourceUpToDate: true to suppress Update(), but reconciliation has not actually succeeded — an async operation is in progress." The Synced condition is therefore always overwritten to True in this path, regardless of what the provider set before returning.
Upjet exploits ResourceUpToDate: true as the mechanism to suppress Update() during async operations, but pays the cost of a permanently false Synced=True signal. This is not a bug in upjet's logic — it is an absence of a contract in crossplane-runtime that would allow the two concerns (suppress Update, signal pending) to be expressed independently.
A corresponding issue has been filed at crossplane/upjet: [link — fill in once filed].
What happened?
When an upjet-based provider (e.g. upbound/provider-aws-kafka) is performing a long-running async operation — such as an MSK cluster broker instance type change taking 60+ minutes — the managed resource's Synced condition remains True for the entire duration.
The AsyncOperation=False (Reason: Ongoing) condition is correctly set by upjet throughout. But Synced continues to read True with Reason: ReconcileSuccess, actively misleading operators, alerting systems, and automation that polls Synced to determine whether a resource has converged.
Expected: Synced=False while an async operation is in progress, transitioning back to True when it completes.
Actual: Synced=True/ReconcileSuccess is actively written on every reconcile pass for the full duration of the operation.
Root cause
In pkg/reconciler/managed/reconciler.go, the reconcile loop unconditionally sets ReconcileSuccess when Observe() returns {ResourceExists: true, ResourceUpToDate: true} with no error:
// Pseudocode — see pkg/reconciler/managed/reconciler.go for the full implementation
observation, err := external.Observe(ctx, managed)
if err != nil {
// ...error path sets ReconcileError...
}
if observation.ResourceUpToDate {
status.MarkConditions(xpv1.ReconcileSuccess())
return reconcile.Result{RequeueAfter: r.pollInterval}, nil
}
// ... else calls Update()There is no branch for "ResourceUpToDate=true but not actually reconciled successfully." The signal is binary: either the resource is up to date (success) or it needs updating (call Update()).
Upjet's ASyncInProgress branch in pkg/controller/external.go is forced to return ResourceUpToDate: true to prevent the reconciler from calling Update() — which could trigger a redundant or harmful second async operation. The consequence is that crossplane-runtime then overwrites any pending/in-progress condition upjet attempted to set, with ReconcileSuccess.
Why this cannot be fixed in upjet alone
The two naive fixes are both broken:
Returning ResourceUpToDate: false: Causes the reconciler to call Update(), potentially queuing a second async operation against an already in-flight AWS change.
Setting ReconcileError before returning: Fails for two independent reasons. First, crossplane-runtime overwrites it with ReconcileSuccess unconditionally on the same pass (since ResourceUpToDate: true is returned). Second, ReconcileError triggers exponential backoff, which is the opposite of what is needed — frequent requeues are necessary so that async completion is detected promptly.
There is also no ReconcilePending constructor in crossplane-runtime v1.x. The only Synced-related constructors available are ReconcileSuccess, ReconcileError, and ReconcilePaused — none model "deferred pending an in-flight async operation."
Proposed solutions
Option 1: Add ReconcilePending condition constructor
Add a new constructor to crossplane-runtime:
// ReconcilePending returns a condition indicating reconciliation is deferred
// pending an in-flight async operation. Unlike ReconcileError, this does not
// trigger exponential backoff.
func ReconcilePending(msg string) xpv1.Condition {
return xpv1.Condition{
Type: xpv1.TypeSynced,
Status: corev1.ConditionFalse,
Reason: "ReconcilePending",
Message: msg,
}
}The managed reconciler would need a corresponding hook — for example, checking whether the provider set a ReconcilePending condition after Observe() returns but before the ResourceUpToDate branch overwrites it, and if so, honouring it without calling status.MarkConditions(ReconcileSuccess()).
This correctly models the semantic distinction: "reconciliation failed" vs "reconciliation is deferred pending an in-flight operation." However, it requires condition inspection inside the reconciler, which is slightly awkward — see Option 2 for a cleaner alternative.
Option 2 (preferred): Extend ExternalObservation with AsyncOperationInProgress bool
ExternalObservation is defined in crossplane-runtime. A new field would allow providers to express both signals simultaneously — suppress Update() and suppress ReconcileSuccess — without the reconciler needing to inspect conditions written as side effects during Observe():
type ExternalObservation struct {
ResourceExists bool
ResourceUpToDate bool
AsyncOperationInProgress bool // suppress Update() AND suppress ReconcileSuccess
// ...
}The managed reconciler would handle this in the ResourceUpToDate branch:
if observation.ResourceUpToDate {
if observation.AsyncOperationInProgress {
status.MarkConditions(xpv1.ReconcilePending("Async operation in progress"))
// r.pollInterval is illustrative; a shorter interval would be preferable
// to detect async completion promptly
return reconcile.Result{RequeueAfter: r.pollInterval}, nil
}
status.MarkConditions(xpv1.ReconcileSuccess())
return reconcile.Result{RequeueAfter: r.pollInterval}, nil
}Upjet would then set AsyncOperationInProgress: true in its ASyncInProgress branch instead of relying on ResourceUpToDate: true alone. This keeps semantics within the observation contract and avoids condition inspection logic in the reconciler.
Option 3 (upjet-only, partial): Dedicated async condition type
Accept that Synced semantics are constrained by crossplane-runtime and document the AsyncOperation=False (Reason: Ongoing) condition as the canonical signal for async-in-progress state. This is the least invasive option but leaves Synced incorrect and does not address the root issue. Not recommended as a resolution.
Operational impact
For operators running alerting on Synced=False persisting beyond a threshold, the current behaviour silently swallows the alert: Synced=True is written on every reconcile pass during a genuine in-progress mutation, so the alert condition never triggers. Any tooling or automation that polls Synced to determine whether a resource has converged will receive a false positive for the full duration of the async update.
Related
- Companion upjet issue (direct fix site once crossplane-runtime unblocks it): [Bug]
Syncedcondition remainsTrueduring ongoing async operations upjet#609 - Original async failure fix in upjet v1.3.0 (async failures correctly set
Synced=False, in-progress case was not addressed): crossplane-contrib/provider-upjet-aws#1164, crossplane/upjet v1.3.0 - Prior discussion on deprecating
Syncedentirely: crossplane/crossplane-runtime#198