Skip to content

Commit 3fe2895

Browse files
josefbacikakpm00
authored andcommitted
mm: fix page leak with multiple threads mapping the same page
We have an application with a lot of threads that use a shared mmap backed by tmpfs mounted with -o huge=within_size. This application started leaking loads of huge pages when we upgraded to a recent kernel. Using the page ref tracepoints and a BPF program written by Tejun Heo we were able to determine that these pages would have multiple refcounts from the page fault path, but when it came to unmap time we wouldn't drop the number of refs we had added from the faults. I wrote a reproducer that mmap'ed a file backed by tmpfs with -o huge=always, and then spawned 20 threads all looping faulting random offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge page aligned ranges. This very quickly reproduced the problem. The problem here is that we check for the case that we have multiple threads faulting in a range that was previously unmapped. One thread maps the PMD, the other thread loses the race and then returns 0. However at this point we already have the page, and we are no longer putting this page into the processes address space, and so we leak the page. We actually did the correct thing prior to f9ce0be, however it looks like Kirill copied what we do in the anonymous page case. In the anonymous page case we don't yet have a page, so we don't have to drop a reference on anything. Previously we did the correct thing for file based faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on the page we faulted in. Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable() case, this makes us drop the ref on the page properly, and now my reproducer no longer leaks the huge pages. [[email protected]: v2] Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com Fixes: f9ce0be ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Chris Mason <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent f073c83 commit 3fe2895

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

mm/memory.c

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4369,9 +4369,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
43694369
return VM_FAULT_OOM;
43704370
}
43714371

4372-
/* See comment in handle_pte_fault() */
4372+
/*
4373+
* See comment in handle_pte_fault() for how this scenario happens, we
4374+
* need to return NOPAGE so that we drop this page.
4375+
*/
43734376
if (pmd_devmap_trans_unstable(vmf->pmd))
4374-
return 0;
4377+
return VM_FAULT_NOPAGE;
43754378

43764379
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
43774380
vmf->address, &vmf->ptl);

0 commit comments

Comments
 (0)