Skip to content

Commit 81bf145

Browse files
Lang Yualexdeucher
authored andcommitted
drm/amdkfd: make sure VM is ready for updating operations
When page table BOs were evicted but not validated before updating page tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY and restore_process_worker runs into a dead loop. v2: Split the BO validation and page table update into two separate loops in amdgpu_amdkfd_restore_process_bos. (Felix) 1.Validate BOs 2.Validate VM (and DMABuf attachments) 3.Update page tables for the BOs validated above Fixes: 50661eb ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by: Lang Yu <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
1 parent e53a171 commit 81bf145

File tree

1 file changed

+20
-14
lines changed

1 file changed

+20
-14
lines changed

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2901,13 +2901,12 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
29012901

29022902
amdgpu_sync_create(&sync_obj);
29032903

2904-
/* Validate BOs and map them to GPUVM (update VM page tables). */
2904+
/* Validate BOs managed by KFD */
29052905
list_for_each_entry(mem, &process_info->kfd_bo_list,
29062906
validate_list) {
29072907

29082908
struct amdgpu_bo *bo = mem->bo;
29092909
uint32_t domain = mem->domain;
2910-
struct kfd_mem_attachment *attachment;
29112910
struct dma_resv_iter cursor;
29122911
struct dma_fence *fence;
29132912

@@ -2932,6 +2931,25 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
29322931
goto validate_map_fail;
29332932
}
29342933
}
2934+
}
2935+
2936+
if (failed_size)
2937+
pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
2938+
2939+
/* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
2940+
* validations above would invalidate DMABuf imports again.
2941+
*/
2942+
ret = process_validate_vms(process_info, &exec.ticket);
2943+
if (ret) {
2944+
pr_debug("Validating VMs failed, ret: %d\n", ret);
2945+
goto validate_map_fail;
2946+
}
2947+
2948+
/* Update mappings managed by KFD. */
2949+
list_for_each_entry(mem, &process_info->kfd_bo_list,
2950+
validate_list) {
2951+
struct kfd_mem_attachment *attachment;
2952+
29352953
list_for_each_entry(attachment, &mem->attachments, list) {
29362954
if (!attachment->is_mapped)
29372955
continue;
@@ -2948,18 +2966,6 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
29482966
}
29492967
}
29502968

2951-
if (failed_size)
2952-
pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
2953-
2954-
/* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
2955-
* validations above would invalidate DMABuf imports again.
2956-
*/
2957-
ret = process_validate_vms(process_info, &exec.ticket);
2958-
if (ret) {
2959-
pr_debug("Validating VMs failed, ret: %d\n", ret);
2960-
goto validate_map_fail;
2961-
}
2962-
29632969
/* Update mappings not managed by KFD */
29642970
list_for_each_entry(peer_vm, &process_info->vm_list_head,
29652971
vm_list_node) {

0 commit comments

Comments
 (0)