Skip to content

Commit 3a5a8d3

Browse files
ryanhrobakpm00
authored andcommitted
mm: fix race between __split_huge_pmd_locked() and GUP-fast
__split_huge_pmd_locked() can be called for a present THP, devmap or (non-present) migration entry. It calls pmdp_invalidate() unconditionally on the pmdp and only determines if it is present or not based on the returned old pmd. This is a problem for the migration entry case because pmd_mkinvalid(), called by pmdp_invalidate() must only be called for a present pmd. On arm64 at least, pmd_mkinvalid() will mark the pmd such that any future call to pmd_present() will return true. And therefore any lockless pgtable walker could see the migration entry pmd in this state and start interpretting the fields as if it were present, leading to BadThings (TM). GUP-fast appears to be one such lockless pgtable walker. x86 does not suffer the above problem, but instead pmd_mkinvalid() will corrupt the offset field of the swap entry within the swap pte. See link below for discussion of that problem. Fix all of this by only calling pmdp_invalidate() for a present pmd. And for good measure let's add a warning to all implementations of pmdp_invalidate[_ad](). I've manually reviewed all other pmdp_invalidate[_ad]() call sites and believe all others to be conformant. This is a theoretical bug found during code review. I don't have any test case to trigger it in practice. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Fixes: 84c3fc4 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Ryan Roberts <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent b0d7e15 commit 3a5a8d3

File tree

7 files changed

+39
-26
lines changed

7 files changed

+39
-26
lines changed

Documentation/mm/arch_pgtable_helpers.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,8 @@ PMD Page Table Helpers
140140
+---------------------------+--------------------------------------------------+
141141
| pmd_swp_clear_soft_dirty | Clears a soft dirty swapped PMD |
142142
+---------------------------+--------------------------------------------------+
143-
| pmd_mkinvalid | Invalidates a mapped PMD [1] |
143+
| pmd_mkinvalid | Invalidates a present PMD; do not call for |
144+
| | non-present PMD [1] |
144145
+---------------------------+--------------------------------------------------+
145146
| pmd_set_huge | Creates a PMD huge mapping |
146147
+---------------------------+--------------------------------------------------+
@@ -196,7 +197,8 @@ PUD Page Table Helpers
196197
+---------------------------+--------------------------------------------------+
197198
| pud_mkdevmap | Creates a ZONE_DEVICE mapped PUD |
198199
+---------------------------+--------------------------------------------------+
199-
| pud_mkinvalid | Invalidates a mapped PUD [1] |
200+
| pud_mkinvalid | Invalidates a present PUD; do not call for |
201+
| | non-present PUD [1] |
200202
+---------------------------+--------------------------------------------------+
201203
| pud_set_huge | Creates a PUD huge mapping |
202204
+---------------------------+--------------------------------------------------+

arch/powerpc/mm/book3s64/pgtable.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
170170
{
171171
unsigned long old_pmd;
172172

173+
VM_WARN_ON_ONCE(!pmd_present(*pmdp));
173174
old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID);
174175
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
175176
return __pmd(old_pmd);

arch/s390/include/asm/pgtable.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1769,8 +1769,10 @@ static inline pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma,
17691769
static inline pmd_t pmdp_invalidate(struct vm_area_struct *vma,
17701770
unsigned long addr, pmd_t *pmdp)
17711771
{
1772-
pmd_t pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID);
1772+
pmd_t pmd;
17731773

1774+
VM_WARN_ON_ONCE(!pmd_present(*pmdp));
1775+
pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID);
17741776
return pmdp_xchg_direct(vma->vm_mm, addr, pmdp, pmd);
17751777
}
17761778

arch/sparc/mm/tlb.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
249249
{
250250
pmd_t old, entry;
251251

252+
VM_WARN_ON_ONCE(!pmd_present(*pmdp));
252253
entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID);
253254
old = pmdp_establish(vma, address, pmdp, entry);
254255
flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);

arch/x86/mm/pgtable.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -631,6 +631,8 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma,
631631
pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
632632
pmd_t *pmdp)
633633
{
634+
VM_WARN_ON_ONCE(!pmd_present(*pmdp));
635+
634636
/*
635637
* No flush is necessary. Once an invalid PTE is established, the PTE's
636638
* access and dirty bits cannot be updated.

mm/huge_memory.c

Lines changed: 26 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2430,32 +2430,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
24302430
return __split_huge_zero_page_pmd(vma, haddr, pmd);
24312431
}
24322432

2433-
/*
2434-
* Up to this point the pmd is present and huge and userland has the
2435-
* whole access to the hugepage during the split (which happens in
2436-
* place). If we overwrite the pmd with the not-huge version pointing
2437-
* to the pte here (which of course we could if all CPUs were bug
2438-
* free), userland could trigger a small page size TLB miss on the
2439-
* small sized TLB while the hugepage TLB entry is still established in
2440-
* the huge TLB. Some CPU doesn't like that.
2441-
* See http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum
2442-
* 383 on page 105. Intel should be safe but is also warns that it's
2443-
* only safe if the permission and cache attributes of the two entries
2444-
* loaded in the two TLB is identical (which should be the case here).
2445-
* But it is generally safer to never allow small and huge TLB entries
2446-
* for the same virtual address to be loaded simultaneously. So instead
2447-
* of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
2448-
* current pmd notpresent (atomically because here the pmd_trans_huge
2449-
* must remain set at all times on the pmd until the split is complete
2450-
* for this pmd), then we flush the SMP TLB and finally we write the
2451-
* non-huge version of the pmd entry with pmd_populate.
2452-
*/
2453-
old_pmd = pmdp_invalidate(vma, haddr, pmd);
2454-
2455-
pmd_migration = is_pmd_migration_entry(old_pmd);
2433+
pmd_migration = is_pmd_migration_entry(*pmd);
24562434
if (unlikely(pmd_migration)) {
24572435
swp_entry_t entry;
24582436

2437+
old_pmd = *pmd;
24592438
entry = pmd_to_swp_entry(old_pmd);
24602439
page = pfn_swap_entry_to_page(entry);
24612440
write = is_writable_migration_entry(entry);
@@ -2466,6 +2445,30 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
24662445
soft_dirty = pmd_swp_soft_dirty(old_pmd);
24672446
uffd_wp = pmd_swp_uffd_wp(old_pmd);
24682447
} else {
2448+
/*
2449+
* Up to this point the pmd is present and huge and userland has
2450+
* the whole access to the hugepage during the split (which
2451+
* happens in place). If we overwrite the pmd with the not-huge
2452+
* version pointing to the pte here (which of course we could if
2453+
* all CPUs were bug free), userland could trigger a small page
2454+
* size TLB miss on the small sized TLB while the hugepage TLB
2455+
* entry is still established in the huge TLB. Some CPU doesn't
2456+
* like that. See
2457+
* http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum
2458+
* 383 on page 105. Intel should be safe but is also warns that
2459+
* it's only safe if the permission and cache attributes of the
2460+
* two entries loaded in the two TLB is identical (which should
2461+
* be the case here). But it is generally safer to never allow
2462+
* small and huge TLB entries for the same virtual address to be
2463+
* loaded simultaneously. So instead of doing "pmd_populate();
2464+
* flush_pmd_tlb_range();" we first mark the current pmd
2465+
* notpresent (atomically because here the pmd_trans_huge must
2466+
* remain set at all times on the pmd until the split is
2467+
* complete for this pmd), then we flush the SMP TLB and finally
2468+
* we write the non-huge version of the pmd entry with
2469+
* pmd_populate.
2470+
*/
2471+
old_pmd = pmdp_invalidate(vma, haddr, pmd);
24692472
page = pmd_page(old_pmd);
24702473
folio = page_folio(page);
24712474
if (pmd_dirty(old_pmd)) {

mm/pgtable-generic.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
198198
pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
199199
pmd_t *pmdp)
200200
{
201+
VM_WARN_ON_ONCE(!pmd_present(*pmdp));
201202
pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
202203
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
203204
return old;
@@ -208,6 +209,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
208209
pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
209210
pmd_t *pmdp)
210211
{
212+
VM_WARN_ON_ONCE(!pmd_present(*pmdp));
211213
return pmdp_invalidate(vma, address, pmdp);
212214
}
213215
#endif

0 commit comments

Comments
 (0)