Skip to content

Commit 023f47a

Browse files
thejhakpm00
authored andcommitted
mm/khugepaged: fix ->anon_vma race
If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires it to be locked. Page table traversal is allowed under any one of the mmap lock, the anon_vma lock (if the VMA is associated with an anon_vma), and the mapping lock (if the VMA is associated with a mapping); and so to be able to remove page tables, we must hold all three of them. retract_page_tables() bails out if an ->anon_vma is attached, but does this check before holding the mmap lock (as the comment above the check explains). If we racily merged an existing ->anon_vma (shared with a child process) from a neighboring VMA, subsequent rmap traversals on pages belonging to the child will be able to see the page tables that we are concurrently removing while assuming that nothing else can access them. Repeat the ->anon_vma check once we hold the mmap lock to ensure that there really is no concurrent page table access. Hitting this bug causes a lockdep warning in collapse_and_free_pmd(), in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)". It can also lead to use-after-free access. Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: f3f0e1d ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Jann Horn <[email protected]> Reported-by: Zach O'Keefe <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 7327e81 commit 023f47a

File tree

1 file changed

+13
-1
lines changed

1 file changed

+13
-1
lines changed

mm/khugepaged.c

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1642,7 +1642,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
16421642
* has higher cost too. It would also probably require locking
16431643
* the anon_vma.
16441644
*/
1645-
if (vma->anon_vma) {
1645+
if (READ_ONCE(vma->anon_vma)) {
16461646
result = SCAN_PAGE_ANON;
16471647
goto next;
16481648
}
@@ -1670,6 +1670,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
16701670
result = SCAN_PTE_MAPPED_HUGEPAGE;
16711671
if ((cc->is_khugepaged || is_target) &&
16721672
mmap_write_trylock(mm)) {
1673+
/*
1674+
* Re-check whether we have an ->anon_vma, because
1675+
* collapse_and_free_pmd() requires that either no
1676+
* ->anon_vma exists or the anon_vma is locked.
1677+
* We already checked ->anon_vma above, but that check
1678+
* is racy because ->anon_vma can be populated under the
1679+
* mmap lock in read mode.
1680+
*/
1681+
if (vma->anon_vma) {
1682+
result = SCAN_PAGE_ANON;
1683+
goto unlock_next;
1684+
}
16731685
/*
16741686
* When a vma is registered with uffd-wp, we can't
16751687
* recycle the pmd pgtable because there can be pte

0 commit comments

Comments
 (0)