Skip to content

Commit 3563d85

Browse files
drm/i915/guc: Fix the fix for reset lock confusion
The previous fix for the circlular lock splat about the busyness worker wasn't quite complete. Even though the reset-in-progress flag is cleared at the start of intel_uc_reset_finish, the entire function is still inside the reset mutex lock. Not sure why the patch appeared to fix the issue both locally and in CI. However, it is now back again. There is a further complication that the wedge code path within intel_gt_reset() jumps around so much that it results in nested reset_prepare/_finish calls. That is, the call sequence is: intel_gt_reset | reset_prepare | __intel_gt_set_wedged | | reset_prepare | | reset_finish | reset_finish The nested finish means that even if the clear of the in-progress flag was moved to the end of _finish, it would still be clear for the entire second call. Surprisingly, this does not seem to be causing any other problems at present. As an aside, a wedge on fini does not call the finish functions at all. The reset_in_progress flag is left set (twice). So instead of trying to cancel the worker anywhere at all in the reset path, just add a cancel to intel_guc_submission_fini instead. Note that it is not a problem if the worker is still active during a reset. Either it will run before the reset path starts locking things and will simply block the reset code for a tiny amount of time. Or it will run after the locks have been acquired and will early exit due to the try-lock. Also, do not use the reset-in-progress flag to decide whether a synchronous cancel is safe (from a lockdep perspective) or not. Instead, use the actual reset mutex state (both the genuine one and the custom rolled BACKOFF one). Fixes: 0e00a88 ("drm/i915/guc: Avoid circular locking issue on busyness flush") Signed-off-by: John Harrison <[email protected]> Cc: Zhanjun Dong <[email protected]> Cc: John Harrison <[email protected]> Cc: Andi Shyti <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Nirmoy Das <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Umesh Nerlige Ramappa <[email protected]> Cc: Andrzej Hajda <[email protected]> Cc: Matt Roper <[email protected]> Cc: Jonathan Cavitt <[email protected]> Cc: Prathap Kumar Valsan <[email protected]> Cc: Alan Previn <[email protected]> Cc: Madhumitha Tolakanahalli Pradeep <[email protected]> Cc: Daniele Ceraolo Spurio <[email protected]> Cc: Ashutosh Dixit <[email protected]> Cc: Dnyaneshwar Bhadane <[email protected]> Reviewed-by: Nirmoy Das <[email protected]> Reviewed-by: Andi Shyti <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
1 parent 7406538 commit 3563d85

File tree

2 files changed

+13
-14
lines changed

2 files changed

+13
-14
lines changed

drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1403,14 +1403,17 @@ static void guc_cancel_busyness_worker(struct intel_guc *guc)
14031403
* Trying to pass a 'need_sync' or 'in_reset' flag all the way down through
14041404
* every possible call stack is unfeasible. It would be too intrusive to many
14051405
* areas that really don't care about the GuC backend. However, there is the
1406-
* 'reset_in_progress' flag available, so just use that.
1406+
* I915_RESET_BACKOFF flag and the gt->reset.mutex can be tested for is_locked.
1407+
* So just use those. Note that testing both is required due to the hideously
1408+
* complex nature of the i915 driver's reset code paths.
14071409
*
14081410
* And note that in the case of a reset occurring during driver unload
1409-
* (wedge_on_fini), skipping the cancel in _prepare (when the reset flag is set
1410-
* is fine because there is another cancel in _finish (when the reset flag is
1411-
* not).
1411+
* (wedged_on_fini), skipping the cancel in reset_prepare/reset_fini (when the
1412+
* reset flag/mutex are set) is fine because there is another explicit cancel in
1413+
* intel_guc_submission_fini (when the reset flag/mutex are not).
14121414
*/
1413-
if (guc_to_gt(guc)->uc.reset_in_progress)
1415+
if (mutex_is_locked(&guc_to_gt(guc)->reset.mutex) ||
1416+
test_bit(I915_RESET_BACKOFF, &guc_to_gt(guc)->reset.flags))
14141417
cancel_delayed_work(&guc->timestamp.work);
14151418
else
14161419
cancel_delayed_work_sync(&guc->timestamp.work);
@@ -1424,8 +1427,6 @@ static void __reset_guc_busyness_stats(struct intel_guc *guc)
14241427
unsigned long flags;
14251428
ktime_t unused;
14261429

1427-
guc_cancel_busyness_worker(guc);
1428-
14291430
spin_lock_irqsave(&guc->timestamp.lock, flags);
14301431

14311432
guc_update_pm_timestamp(guc, &unused);
@@ -2004,13 +2005,6 @@ void intel_guc_submission_cancel_requests(struct intel_guc *guc)
20042005

20052006
void intel_guc_submission_reset_finish(struct intel_guc *guc)
20062007
{
2007-
/*
2008-
* Ensure the busyness worker gets cancelled even on a fatal wedge.
2009-
* Note that reset_prepare is not allowed to because it confuses lockdep.
2010-
*/
2011-
if (guc_submission_initialized(guc))
2012-
guc_cancel_busyness_worker(guc);
2013-
20142008
/* Reset called during driver load or during wedge? */
20152009
if (unlikely(!guc_submission_initialized(guc) ||
20162010
!intel_guc_is_fw_running(guc) ||
@@ -2136,6 +2130,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)
21362130
if (!guc->submission_initialized)
21372131
return;
21382132

2133+
guc_fini_engine_stats(guc);
21392134
guc_flush_destroyed_contexts(guc);
21402135
guc_lrc_desc_pool_destroy_v69(guc);
21412136
i915_sched_engine_put(guc->sched_engine);

drivers/gpu/drm/i915/gt/uc/intel_uc.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -637,6 +637,10 @@ void intel_uc_reset_finish(struct intel_uc *uc)
637637
{
638638
struct intel_guc *guc = &uc->guc;
639639

640+
/*
641+
* NB: The wedge code path results in prepare -> prepare -> finish -> finish.
642+
* So this function is sometimes called with the in-progress flag not set.
643+
*/
640644
uc->reset_in_progress = false;
641645

642646
/* Firmware expected to be running when this function is called */

0 commit comments

Comments
 (0)