Skip to content

Commit 101025e

Browse files
jokim-amdalexdeucher
authored andcommitted
drm/amdkfd: fix missed queue reset on queue destroy
If a queue is being destroyed but causes a HWS hang on removal, the KFD may issue an unnecessary gpu reset if the destroyed queue can be fixed by a queue reset. This is because the queue has been removed from the KFD's queue list prior to the preemption action on destroy so the reset call will fail to match the HQD PQ reset information against the KFD's queue record to do the actual reset. To fix this, deactivate the queue prior to preemption since it's being destroyed anyways and remove the queue from the KFD's queue list after preemption. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
1 parent 01be2b6 commit 101025e

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2407,10 +2407,9 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
24072407
pdd->sdma_past_activity_counter += sdma_val;
24082408
}
24092409

2410-
list_del(&q->list);
2411-
qpd->queue_count--;
24122410
if (q->properties.is_active) {
24132411
decrement_queue_count(dqm, qpd, q);
2412+
q->properties.is_active = false;
24142413
if (!dqm->dev->kfd->shared_resources.enable_mes) {
24152414
retval = execute_queues_cpsch(dqm,
24162415
KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0,
@@ -2421,6 +2420,8 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
24212420
retval = remove_queue_mes(dqm, q, qpd);
24222421
}
24232422
}
2423+
list_del(&q->list);
2424+
qpd->queue_count--;
24242425

24252426
/*
24262427
* Unconditionally decrement this counter, regardless of the queue's

0 commit comments

Comments
 (0)