Skip to content

Commit e30cb05

Browse files
committed
drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()
drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped from the dependency array that was waited upon before drm_sched_entity_kill() was called (drm_sched_entity::dependency field), so we're basically waiting for all dependencies except one. In theory, this wait shouldn't be needed because resources should have their users registered to the dma_resv object, thus guaranteeing that future jobs wanting to access these resources wait on all the previous users (depending on the access type, of course). But we want to keep these explicit waits in the kill entity path just in case. Let's make sure we keep all dependencies in the array in drm_sched_job_dependency(), so we can iterate over the array and wait in drm_sched_entity_kill_jobs_cb(). We also make sure we wait on drm_sched_fence::finished if we were originally asked to wait on drm_sched_fence::scheduled. In that case, we assume the intent was to delegate the wait to the firmware/GPU or rely on the pipelining done at the entity/scheduler level, but when killing jobs, we really want to wait for completion not just scheduling. v2: - Don't evict deps in drm_sched_job_dependency() v3: - Always wait for drm_sched_fence::finished fences in drm_sched_entity_kill_jobs_cb() when we see a sched_fence v4: - Fix commit message - Fix a use-after-free bug v5: - Flag deps on which we should only wait for the scheduled event at insertion time v6: - Back to v4 implementation - Add Christian's R-b Cc: Frank Binns <[email protected]> Cc: Sarah Walker <[email protected]> Cc: Donald Robson <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: "Christian König" <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Suggested-by: "Christian König" <[email protected]> Reviewed-by: "Christian König" <[email protected]> Acked-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
1 parent 2c56a75 commit e30cb05

File tree

1 file changed

+33
-8
lines changed

1 file changed

+33
-8
lines changed

drivers/gpu/drm/scheduler/sched_entity.c

Lines changed: 33 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -155,16 +155,32 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
155155
{
156156
struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
157157
finish_cb);
158-
int r;
158+
unsigned long index;
159159

160160
dma_fence_put(f);
161161

162162
/* Wait for all dependencies to avoid data corruptions */
163-
while (!xa_empty(&job->dependencies)) {
164-
f = xa_erase(&job->dependencies, job->last_dependency++);
165-
r = dma_fence_add_callback(f, &job->finish_cb,
166-
drm_sched_entity_kill_jobs_cb);
167-
if (!r)
163+
xa_for_each(&job->dependencies, index, f) {
164+
struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
165+
166+
if (s_fence && f == &s_fence->scheduled) {
167+
/* The dependencies array had a reference on the scheduled
168+
* fence, and the finished fence refcount might have
169+
* dropped to zero. Use dma_fence_get_rcu() so we get
170+
* a NULL fence in that case.
171+
*/
172+
f = dma_fence_get_rcu(&s_fence->finished);
173+
174+
/* Now that we have a reference on the finished fence,
175+
* we can release the reference the dependencies array
176+
* had on the scheduled fence.
177+
*/
178+
dma_fence_put(&s_fence->scheduled);
179+
}
180+
181+
xa_erase(&job->dependencies, index);
182+
if (f && !dma_fence_add_callback(f, &job->finish_cb,
183+
drm_sched_entity_kill_jobs_cb))
168184
return;
169185

170186
dma_fence_put(f);
@@ -394,8 +410,17 @@ static struct dma_fence *
394410
drm_sched_job_dependency(struct drm_sched_job *job,
395411
struct drm_sched_entity *entity)
396412
{
397-
if (!xa_empty(&job->dependencies))
398-
return xa_erase(&job->dependencies, job->last_dependency++);
413+
struct dma_fence *f;
414+
415+
/* We keep the fence around, so we can iterate over all dependencies
416+
* in drm_sched_entity_kill_jobs_cb() to ensure all deps are signaled
417+
* before killing the job.
418+
*/
419+
f = xa_load(&job->dependencies, job->last_dependency);
420+
if (f) {
421+
job->last_dependency++;
422+
return dma_fence_get(f);
423+
}
399424

400425
if (job->sched->ops->prepare_job)
401426
return job->sched->ops->prepare_job(job, entity);

0 commit comments

Comments
 (0)