Skip to content

Commit aa32f11

Browse files
committed
Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull hmm updates from Jason Gunthorpe: "This is another round of bug fixing and cleanup. This time the focus is on the driver pattern to use mmu notifiers to monitor a VA range. This code is lifted out of many drivers and hmm_mirror directly into the mmu_notifier core and written using the best ideas from all the driver implementations. This removes many bugs from the drivers and has a very pleasing diffstat. More drivers can still be converted, but that is for another cycle. - A shared branch with RDMA reworking the RDMA ODP implementation - New mmu_interval_notifier API. This is focused on the use case of monitoring a VA and simplifies the process for drivers - A common seq-count locking scheme built into the mmu_interval_notifier API usable by drivers that call get_user_pages() or hmm_range_fault() with the VA range - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen GntDev drivers to the new API. This deletes a lot of wonky driver code. - Two improvements for hmm_range_fault(), from testing done by Ralph" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap mm/hmm: make full use of walk_page_range() xen/gntdev: use mmu_interval_notifier_insert mm/hmm: remove hmm_mirror and related drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror drm/amdgpu: Call find_vma under mmap_sem nouveau: use mmu_interval_notifier instead of hmm_mirror nouveau: use mmu_notifier directly for invalidate_range_start drm/radeon: use mmu_interval_notifier_insert RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv RDMA/odp: Use mmu_interval_notifier_insert() mm/hmm: define the pre-processor related parts of hmm.h even if disabled mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror mm/mmu_notifier: add an interval tree notifier mm/mmu_notifier: define the header pre-processor parts even if disabled mm/hmm: allow snapshot of the special zero page
2 parents d5bb349 + 93f4e73 commit aa32f11

31 files changed

+1298
-2139
lines changed

Documentation/vm/hmm.rst

Lines changed: 24 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -147,49 +147,16 @@ Address space mirroring implementation and API
147147
Address space mirroring's main objective is to allow duplication of a range of
148148
CPU page table into a device page table; HMM helps keep both synchronized. A
149149
device driver that wants to mirror a process address space must start with the
150-
registration of an hmm_mirror struct::
151-
152-
int hmm_mirror_register(struct hmm_mirror *mirror,
153-
struct mm_struct *mm);
154-
155-
The mirror struct has a set of callbacks that are used
156-
to propagate CPU page tables::
157-
158-
struct hmm_mirror_ops {
159-
/* release() - release hmm_mirror
160-
*
161-
* @mirror: pointer to struct hmm_mirror
162-
*
163-
* This is called when the mm_struct is being released. The callback
164-
* must ensure that all access to any pages obtained from this mirror
165-
* is halted before the callback returns. All future access should
166-
* fault.
167-
*/
168-
void (*release)(struct hmm_mirror *mirror);
169-
170-
/* sync_cpu_device_pagetables() - synchronize page tables
171-
*
172-
* @mirror: pointer to struct hmm_mirror
173-
* @update: update information (see struct mmu_notifier_range)
174-
* Return: -EAGAIN if update.blockable false and callback need to
175-
* block, 0 otherwise.
176-
*
177-
* This callback ultimately originates from mmu_notifiers when the CPU
178-
* page table is updated. The device driver must update its page table
179-
* in response to this callback. The update argument tells what action
180-
* to perform.
181-
*
182-
* The device driver must not return from this callback until the device
183-
* page tables are completely updated (TLBs flushed, etc); this is a
184-
* synchronous call.
185-
*/
186-
int (*sync_cpu_device_pagetables)(struct hmm_mirror *mirror,
187-
const struct hmm_update *update);
188-
};
189-
190-
The device driver must perform the update action to the range (mark range
191-
read only, or fully unmap, etc.). The device must complete the update before
192-
the driver callback returns.
150+
registration of a mmu_interval_notifier::
151+
152+
mni->ops = &driver_ops;
153+
int mmu_interval_notifier_insert(struct mmu_interval_notifier *mni,
154+
unsigned long start, unsigned long length,
155+
struct mm_struct *mm);
156+
157+
During the driver_ops->invalidate() callback the device driver must perform
158+
the update action to the range (mark range read only, or fully unmap,
159+
etc.). The device must complete the update before the driver callback returns.
193160

194161
When the device driver wants to populate a range of virtual addresses, it can
195162
use::
@@ -216,70 +183,46 @@ The usage pattern is::
216183
struct hmm_range range;
217184
...
218185

186+
range.notifier = &mni;
219187
range.start = ...;
220188
range.end = ...;
221189
range.pfns = ...;
222190
range.flags = ...;
223191
range.values = ...;
224192
range.pfn_shift = ...;
225-
hmm_range_register(&range, mirror);
226193

227-
/*
228-
* Just wait for range to be valid, safe to ignore return value as we
229-
* will use the return value of hmm_range_fault() below under the
230-
* mmap_sem to ascertain the validity of the range.
231-
*/
232-
hmm_range_wait_until_valid(&range, TIMEOUT_IN_MSEC);
194+
if (!mmget_not_zero(mni->notifier.mm))
195+
return -EFAULT;
233196

234197
again:
198+
range.notifier_seq = mmu_interval_read_begin(&mni);
235199
down_read(&mm->mmap_sem);
236200
ret = hmm_range_fault(&range, HMM_RANGE_SNAPSHOT);
237201
if (ret) {
238202
up_read(&mm->mmap_sem);
239-
if (ret == -EBUSY) {
240-
/*
241-
* No need to check hmm_range_wait_until_valid() return value
242-
* on retry we will get proper error with hmm_range_fault()
243-
*/
244-
hmm_range_wait_until_valid(&range, TIMEOUT_IN_MSEC);
245-
goto again;
246-
}
247-
hmm_range_unregister(&range);
203+
if (ret == -EBUSY)
204+
goto again;
248205
return ret;
249206
}
207+
up_read(&mm->mmap_sem);
208+
250209
take_lock(driver->update);
251-
if (!hmm_range_valid(&range)) {
210+
if (mmu_interval_read_retry(&ni, range.notifier_seq) {
252211
release_lock(driver->update);
253-
up_read(&mm->mmap_sem);
254212
goto again;
255213
}
256214

257-
// Use pfns array content to update device page table
215+
/* Use pfns array content to update device page table,
216+
* under the update lock */
258217

259-
hmm_range_unregister(&range);
260218
release_lock(driver->update);
261-
up_read(&mm->mmap_sem);
262219
return 0;
263220
}
264221

265222
The driver->update lock is the same lock that the driver takes inside its
266-
sync_cpu_device_pagetables() callback. That lock must be held before calling
267-
hmm_range_valid() to avoid any race with a concurrent CPU page table update.
268-
269-
HMM implements all this on top of the mmu_notifier API because we wanted a
270-
simpler API and also to be able to perform optimizations latter on like doing
271-
concurrent device updates in multi-devices scenario.
272-
273-
HMM also serves as an impedance mismatch between how CPU page table updates
274-
are done (by CPU write to the page table and TLB flushes) and how devices
275-
update their own page table. Device updates are a multi-step process. First,
276-
appropriate commands are written to a buffer, then this buffer is scheduled for
277-
execution on the device. It is only once the device has executed commands in
278-
the buffer that the update is done. Creating and scheduling the update command
279-
buffer can happen concurrently for multiple devices. Waiting for each device to
280-
report commands as executed is serialized (there is no point in doing this
281-
concurrently).
282-
223+
invalidate() callback. That lock must be held before calling
224+
mmu_interval_read_retry() to avoid any race with a concurrent CPU page table
225+
update.
283226

284227
Leverage default_flags and pfn_flags_mask
285228
=========================================

drivers/gpu/drm/amd/amdgpu/amdgpu.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -967,6 +967,8 @@ struct amdgpu_device {
967967
struct mutex lock_reset;
968968
struct amdgpu_doorbell_index doorbell_index;
969969

970+
struct mutex notifier_lock;
971+
970972
int asic_reset_res;
971973
struct work_struct xgmi_reset_work;
972974

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -505,8 +505,7 @@ static void remove_kgd_mem_from_kfd_bo_list(struct kgd_mem *mem,
505505
*
506506
* Returns 0 for success, negative errno for errors.
507507
*/
508-
static int init_user_pages(struct kgd_mem *mem, struct mm_struct *mm,
509-
uint64_t user_addr)
508+
static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr)
510509
{
511510
struct amdkfd_process_info *process_info = mem->process_info;
512511
struct amdgpu_bo *bo = mem->bo;
@@ -1199,7 +1198,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
11991198
add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info, user_addr);
12001199

12011200
if (user_addr) {
1202-
ret = init_user_pages(*mem, current->mm, user_addr);
1201+
ret = init_user_pages(*mem, user_addr);
12031202
if (ret)
12041203
goto allocate_init_user_pages_failed;
12051204
}
@@ -1744,6 +1743,10 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
17441743
return ret;
17451744
}
17461745

1746+
/*
1747+
* FIXME: Cannot ignore the return code, must hold
1748+
* notifier_lock
1749+
*/
17471750
amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
17481751

17491752
/* Mark the BO as valid unless it was invalidated

drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -538,8 +538,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
538538
e->tv.num_shared = 2;
539539

540540
amdgpu_bo_list_get_list(p->bo_list, &p->validated);
541-
if (p->bo_list->first_userptr != p->bo_list->num_entries)
542-
p->mn = amdgpu_mn_get(p->adev, AMDGPU_MN_TYPE_GFX);
543541

544542
INIT_LIST_HEAD(&duplicates);
545543
amdgpu_vm_get_pd_bo(&fpriv->vm, &p->validated, &p->vm_pd);
@@ -1219,11 +1217,11 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
12191217
if (r)
12201218
goto error_unlock;
12211219

1222-
/* No memory allocation is allowed while holding the mn lock.
1223-
* p->mn is hold until amdgpu_cs_submit is finished and fence is added
1224-
* to BOs.
1220+
/* No memory allocation is allowed while holding the notifier lock.
1221+
* The lock is held until amdgpu_cs_submit is finished and fence is
1222+
* added to BOs.
12251223
*/
1226-
amdgpu_mn_lock(p->mn);
1224+
mutex_lock(&p->adev->notifier_lock);
12271225

12281226
/* If userptr are invalidated after amdgpu_cs_parser_bos(), return
12291227
* -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
@@ -1266,13 +1264,13 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
12661264
amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
12671265

12681266
ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
1269-
amdgpu_mn_unlock(p->mn);
1267+
mutex_unlock(&p->adev->notifier_lock);
12701268

12711269
return 0;
12721270

12731271
error_abort:
12741272
drm_sched_job_cleanup(&job->base);
1275-
amdgpu_mn_unlock(p->mn);
1273+
mutex_unlock(&p->adev->notifier_lock);
12761274

12771275
error_unlock:
12781276
amdgpu_job_free(job);

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2794,6 +2794,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
27942794
mutex_init(&adev->virt.vf_errors.lock);
27952795
hash_init(adev->mn_hash);
27962796
mutex_init(&adev->lock_reset);
2797+
mutex_init(&adev->notifier_lock);
27972798
mutex_init(&adev->virt.dpm_mutex);
27982799
mutex_init(&adev->psp.mutex);
27992800

0 commit comments

Comments
 (0)