Skip to content

Commit 6c21e06

Browse files
thejhtorvalds
authored andcommitted
mm/mempolicy: Take VMA lock before replacing policy
mbind() calls down into vma_replace_policy() without taking the per-VMA locks, replaces the VMA's vma->vm_policy pointer, and frees the old policy. That's bad; a concurrent page fault might still be using the old policy (in vma_alloc_folio()), resulting in use-after-free. Normally this will manifest as a use-after-free read first, but it can result in memory corruption, including because vma_alloc_folio() can call mpol_cond_put() on the freed policy, which conditionally changes the policy's refcount member. This bug is specific to CONFIG_NUMA, but it does also affect non-NUMA systems as long as the kernel was built with CONFIG_NUMA. Signed-off-by: Jann Horn <[email protected]> Reviewed-by: Suren Baghdasaryan <[email protected]> Fixes: 5e31275 ("mm: add per-VMA lock and helper functions to control it") Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
1 parent 57012c5 commit 6c21e06

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

mm/mempolicy.c

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -384,8 +384,10 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new)
384384
VMA_ITERATOR(vmi, mm, 0);
385385

386386
mmap_write_lock(mm);
387-
for_each_vma(vmi, vma)
387+
for_each_vma(vmi, vma) {
388+
vma_start_write(vma);
388389
mpol_rebind_policy(vma->vm_policy, new);
390+
}
389391
mmap_write_unlock(mm);
390392
}
391393

@@ -768,6 +770,8 @@ static int vma_replace_policy(struct vm_area_struct *vma,
768770
struct mempolicy *old;
769771
struct mempolicy *new;
770772

773+
vma_assert_write_locked(vma);
774+
771775
pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
772776
vma->vm_start, vma->vm_end, vma->vm_pgoff,
773777
vma->vm_ops, vma->vm_file,
@@ -1313,6 +1317,14 @@ static long do_mbind(unsigned long start, unsigned long len,
13131317
if (err)
13141318
goto mpol_out;
13151319

1320+
/*
1321+
* Lock the VMAs before scanning for pages to migrate, to ensure we don't
1322+
* miss a concurrently inserted page.
1323+
*/
1324+
vma_iter_init(&vmi, mm, start);
1325+
for_each_vma_range(vmi, vma, end)
1326+
vma_start_write(vma);
1327+
13161328
ret = queue_pages_range(mm, start, end, nmask,
13171329
flags | MPOL_MF_INVERT, &pagelist);
13181330

@@ -1538,6 +1550,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
15381550
break;
15391551
}
15401552

1553+
vma_start_write(vma);
15411554
new->home_node = home_node;
15421555
err = mbind_range(&vmi, vma, &prev, start, end, new);
15431556
mpol_put(new);

0 commit comments

Comments
 (0)