Skip to content

Commit 2ea7ff1

Browse files
xzpeterakpm00
authored andcommitted
mm/hugetlb: fix race condition of uffd missing/minor handling
Patch series "mm/hugetlb: Fix selftest failures with write check", v3. Currently akpm mm-unstable fails with uffd hugetlb private mapping test randomly on a write check. The initial bisection of that points to the recent pmd unshare series, but it turns out there's no direction relationship with the series but only some timing change caused the race to start trigger. The race should be fixed in patch 1. Patch 2 is a trivial cleanup on the similar race with hugetlb migrations, patch 3 comment on the write check so when anyone read it again it'll be clear why it's there. This patch (of 3): After the recent rework patchset of hugetlb locking on pmd sharing, kselftest for userfaultfd sometimes fails on hugetlb private tests with unexpected write fault checks. It turns out there's nothing wrong within the locking series regarding this matter, but it could have changed the timing of threads so it can trigger an old bug. The real bug is when we call hugetlb_no_page() we're not with the pgtable lock. It means we're reading the pte values lockless. It's perfectly fine in most cases because before we do normal page allocations we'll take the lock and check pte_same() again. However before that, there are actually two paths on userfaultfd missing/minor handling that may directly move on with the fault process without checking the pte values. It means for these two paths we may be generating an uffd message based on an unstable pte, while an unstable pte can legally be anything as long as the modifier holds the pgtable lock. One example, which is also what happened in the failing kselftest and caused the test failure, is that for private mappings wr-protection changes can happen on one page. While hugetlb_change_protection() generally requires pte being cleared before being changed, then there can be a race condition like: thread 1 thread 2 -------- -------- UFFDIO_WRITEPROTECT hugetlb_fault hugetlb_change_protection pgtable_lock() huge_ptep_modify_prot_start pte==NULL hugetlb_no_page generate uffd missing event even if page existed!! huge_ptep_modify_prot_commit pgtable_unlock() Fix this by rechecking the pte after pgtable lock for both userfaultfd missing & minor fault paths. This bug should have been around starting from uffd hugetlb introduced, so attaching a Fixes to the commit. Also attach another Fixes to the minor support commit for easier tracking. Note that userfaultfd is actually fine with false positives (e.g. caused by pte changed), but not wrong logical events (e.g. caused by reading a pte during changing). The latter can confuse the userspace, so the strictness is very much preferred. E.g., MISSING event should never happen on the page after UFFDIO_COPY has correctly installed the page and returned. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 1a1aad8 ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook") Fixes: 7677f7f ("userfaultfd: add minor fault registration mode") Signed-off-by: Peter Xu <[email protected]> Co-developed-by: Mike Kravetz <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Nadav Amit <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 94541bc commit 2ea7ff1

File tree

1 file changed

+52
-7
lines changed

1 file changed

+52
-7
lines changed

mm/hugetlb.c

Lines changed: 52 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5535,6 +5535,23 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma,
55355535
return handle_userfault(&vmf, reason);
55365536
}
55375537

5538+
/*
5539+
* Recheck pte with pgtable lock. Returns true if pte didn't change, or
5540+
* false if pte changed or is changing.
5541+
*/
5542+
static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm,
5543+
pte_t *ptep, pte_t old_pte)
5544+
{
5545+
spinlock_t *ptl;
5546+
bool same;
5547+
5548+
ptl = huge_pte_lock(h, mm, ptep);
5549+
same = pte_same(huge_ptep_get(ptep), old_pte);
5550+
spin_unlock(ptl);
5551+
5552+
return same;
5553+
}
5554+
55385555
static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
55395556
struct vm_area_struct *vma,
55405557
struct address_space *mapping, pgoff_t idx,
@@ -5575,10 +5592,33 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
55755592
if (idx >= size)
55765593
goto out;
55775594
/* Check for page in userfault range */
5578-
if (userfaultfd_missing(vma))
5579-
return hugetlb_handle_userfault(vma, mapping, idx,
5580-
flags, haddr, address,
5581-
VM_UFFD_MISSING);
5595+
if (userfaultfd_missing(vma)) {
5596+
/*
5597+
* Since hugetlb_no_page() was examining pte
5598+
* without pgtable lock, we need to re-test under
5599+
* lock because the pte may not be stable and could
5600+
* have changed from under us. Try to detect
5601+
* either changed or during-changing ptes and retry
5602+
* properly when needed.
5603+
*
5604+
* Note that userfaultfd is actually fine with
5605+
* false positives (e.g. caused by pte changed),
5606+
* but not wrong logical events (e.g. caused by
5607+
* reading a pte during changing). The latter can
5608+
* confuse the userspace, so the strictness is very
5609+
* much preferred. E.g., MISSING event should
5610+
* never happen on the page after UFFDIO_COPY has
5611+
* correctly installed the page and returned.
5612+
*/
5613+
if (!hugetlb_pte_stable(h, mm, ptep, old_pte)) {
5614+
ret = 0;
5615+
goto out;
5616+
}
5617+
5618+
return hugetlb_handle_userfault(vma, mapping, idx, flags,
5619+
haddr, address,
5620+
VM_UFFD_MISSING);
5621+
}
55825622

55835623
page = alloc_huge_page(vma, haddr, 0);
55845624
if (IS_ERR(page)) {
@@ -5644,9 +5684,14 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
56445684
if (userfaultfd_minor(vma)) {
56455685
unlock_page(page);
56465686
put_page(page);
5647-
return hugetlb_handle_userfault(vma, mapping, idx,
5648-
flags, haddr, address,
5649-
VM_UFFD_MINOR);
5687+
/* See comment in userfaultfd_missing() block above */
5688+
if (!hugetlb_pte_stable(h, mm, ptep, old_pte)) {
5689+
ret = 0;
5690+
goto out;
5691+
}
5692+
return hugetlb_handle_userfault(vma, mapping, idx, flags,
5693+
haddr, address,
5694+
VM_UFFD_MINOR);
56505695
}
56515696
}
56525697

0 commit comments

Comments
 (0)