Skip to content

Commit cdd1569

Browse files
committed
drm/etnaviv: consider completed fence seqno in hang check
Some GPU heavy test programs manage to trigger the hangcheck quite often. If there are no other GPU users in the system and the test program exhibits a very regular structure in the commandstreams that are being submitted, we can end up with two distinct submits managing to trigger the hangcheck with the FE in a very similar address range. This leads the hangcheck to believe that the GPU is stuck, while in reality the GPU is already busy working on a different job. To avoid those spurious GPU resets, also remember and consider the last completed fence seqno in the hang check. Reported-by: Joerg Albert <[email protected]> Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
1 parent 6dfa2fa commit cdd1569

File tree

2 files changed

+4
-1
lines changed

2 files changed

+4
-1
lines changed

drivers/gpu/drm/etnaviv/etnaviv_gpu.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ struct etnaviv_gpu {
130130

131131
/* hang detection */
132132
u32 hangcheck_dma_addr;
133+
u32 hangcheck_fence;
133134

134135
void __iomem *mmio;
135136
int irq;

drivers/gpu/drm/etnaviv/etnaviv_sched.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,8 +107,10 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job
107107
*/
108108
dma_addr = gpu_read(gpu, VIVS_FE_DMA_ADDRESS);
109109
change = dma_addr - gpu->hangcheck_dma_addr;
110-
if (change < 0 || change > 16) {
110+
if (gpu->completed_fence != gpu->hangcheck_fence ||
111+
change < 0 || change > 16) {
111112
gpu->hangcheck_dma_addr = dma_addr;
113+
gpu->hangcheck_fence = gpu->completed_fence;
112114
goto out_no_timeout;
113115
}
114116

0 commit comments

Comments
 (0)