Skip to content

Commit 0323a51

Browse files
jnar17jlahtine-intel
authored andcommitted
drm/i915/guc: Handle race condition where wakeref count drops below 0
There is a rare race condition when preparing for a reset where guc_lrc_desc_unpin() could be in the process of deregistering a context while a different thread is scrubbing outstanding contexts and it alters the context state and does a wakeref put. Then, if there is a failure with deregister_context(), a second wakeref put could occur. As a result the wakeref count could drop below 0 and fail an INTEL_WAKEREF_BUG_ON() check. Therefore if there is a failure with deregister_context(), undo the context state changes and do a wakeref put only if the context was set to be destroyed earlier. v2: Expand comment to better explain change. (Daniele) v3: Removed addition to the original comment. (Daniele) Fixes: 2f2cc53 ("drm/i915/guc: Close deregister-context race against CT-loss") Signed-off-by: Jesus Narvaez <[email protected]> Cc: Daniele Ceraolo Spurio <[email protected]> Cc: Alan Previn <[email protected]> Cc: Anshuman Gupta <[email protected]> Cc: Mousumi Jana <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Matt Roper <[email protected]> Reviewed-by: Daniele Ceraolo Spurio <[email protected]> Signed-off-by: John Harrison <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit f36a75a) Signed-off-by: Joonas Lahtinen <[email protected]>
1 parent 57d63c6 commit 0323a51

File tree

1 file changed

+14
-3
lines changed

1 file changed

+14
-3
lines changed

drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3443,18 +3443,29 @@ static inline int guc_lrc_desc_unpin(struct intel_context *ce)
34433443
* GuC is active, lets destroy this context, but at this point we can still be racing
34443444
* with suspend, so we undo everything if the H2G fails in deregister_context so
34453445
* that GuC reset will find this context during clean up.
3446+
*
3447+
* There is a race condition where the reset code could have altered
3448+
* this context's state and done a wakeref put before we try to
3449+
* deregister it here. So check if the context is still set to be
3450+
* destroyed before undoing earlier changes, to avoid two wakeref puts
3451+
* on the same context.
34463452
*/
34473453
ret = deregister_context(ce, ce->guc_id.id);
34483454
if (ret) {
3455+
bool pending_destroyed;
34493456
spin_lock_irqsave(&ce->guc_state.lock, flags);
3450-
set_context_registered(ce);
3451-
clr_context_destroyed(ce);
3457+
pending_destroyed = context_destroyed(ce);
3458+
if (pending_destroyed) {
3459+
set_context_registered(ce);
3460+
clr_context_destroyed(ce);
3461+
}
34523462
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
34533463
/*
34543464
* As gt-pm is awake at function entry, intel_wakeref_put_async merely decrements
34553465
* the wakeref immediately but per function spec usage call this after unlock.
34563466
*/
3457-
intel_wakeref_put_async(&gt->wakeref);
3467+
if (pending_destroyed)
3468+
intel_wakeref_put_async(&gt->wakeref);
34583469
}
34593470

34603471
return ret;

0 commit comments

Comments
 (0)