Skip to content

Commit aa5fc43

Browse files
Liu01 Tongalexdeucher
authored andcommitted
drm/amdgpu: fix task hang from failed job submission during process kill
During process kill, drm_sched_entity_flush() will kill the vm entities. The following job submissions of this process will fail, and the resources of these jobs have not been released, nor have the fences been signalled, causing tasks to hang and timeout. Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to stopped entity. v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in function amdgpu_cs_vm_handling(). Fixes: 1f02f20 ("drm/amdgpu: Avoid extra evict-restore process.") Signed-off-by: Liu01 Tong <[email protected]> Signed-off-by: Lin.Cao <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit f101c13)
1 parent 040bc6d commit aa5fc43

File tree

2 files changed

+14
-4
lines changed

2 files changed

+14
-4
lines changed

drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1139,6 +1139,9 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
11391139
}
11401140
}
11411141

1142+
if (!amdgpu_vm_ready(vm))
1143+
return -EINVAL;
1144+
11421145
r = amdgpu_vm_clear_freed(adev, vm, NULL);
11431146
if (r)
11441147
return r;

drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -654,22 +654,29 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, struct amdgpu_vm *vm,
654654
* Check if all VM PDs/PTs are ready for updates
655655
*
656656
* Returns:
657-
* True if VM is not evicting.
657+
* True if VM is not evicting and all VM entities are not stopped
658658
*/
659659
bool amdgpu_vm_ready(struct amdgpu_vm *vm)
660660
{
661-
bool empty;
662661
bool ret;
663662

664663
amdgpu_vm_eviction_lock(vm);
665664
ret = !vm->evicting;
666665
amdgpu_vm_eviction_unlock(vm);
667666

668667
spin_lock(&vm->status_lock);
669-
empty = list_empty(&vm->evicted);
668+
ret &= list_empty(&vm->evicted);
670669
spin_unlock(&vm->status_lock);
671670

672-
return ret && empty;
671+
spin_lock(&vm->immediate.lock);
672+
ret &= !vm->immediate.stopped;
673+
spin_unlock(&vm->immediate.lock);
674+
675+
spin_lock(&vm->delayed.lock);
676+
ret &= !vm->delayed.stopped;
677+
spin_unlock(&vm->delayed.lock);
678+
679+
return ret;
673680
}
674681

675682
/**

0 commit comments

Comments
 (0)