Skip to content

Commit d74943a

Browse files
davidhildenbrandakpm00
authored andcommitted
mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
Unfortunately commit 474098e ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") missed that follow_page() and follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting or due to inaccessible (PROT_NONE) VMAs. As spelled out in commit 0b9d705 ("mm: numa: Support NUMA hinting page faults from gup/gup_fast"): "Other follow_page callers like KSM should not use FOLL_NUMA, or they would fail to get the pages if they use follow_page instead of get_user_pages." liubo reported [1] that smaps_rollup results are imprecise, because they miss accounting of pages that are mapped PROT_NONE. Further, it's easy to reproduce that KSM no longer works on inaccessible VMAs on x86-64, because pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs, and follow_page() refuses to return such pages right now. As KVM really depends on these NUMA hinting faults, removing the pte_protnone()/pmd_protnone() handling in GUP code completely is not really an option. To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT to restore the original behavior for now and add better comments. Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in is_valid_gup_args(), to add that flag for all external GUP users. Note that there are three GUP-internal __get_user_pages() users that don't end up calling is_valid_gup_args() and consequently won't get FOLL_HONOR_NUMA_FAULT set. 1) get_dump_page(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE and wouldn't have honored NUMA hinting faults already. 2) populate_vma_page_range(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have honored NUMA hinting faults already. 3) faultin_vma_page_range(): we similarly don't want to handle NUMA hinting faults. To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in inaccessible VMAs properly, we have to perform VMA accessibility checks in gup_can_follow_protnone(). As GUP-fast should reject such pages either way in pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and arm64 that both implement pte_protnone() -- let's just always fallback to ordinary GUP when stumbling over pte_protnone()/pmd_protnone(). As Linus notes [2], honoring NUMA faults might only make sense for selected GUP users. So we should really see if we can instead let relevant GUP callers specify it manually, and not trigger NUMA hinting faults from GUP as default. Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and adding appropriate documenation. While at it, remove a stale comment from follow_trans_huge_pmd(): That comment for pmd_protnone() was added in commit 2b4847e ("mm: numa: serialise parallel get_user_page against THP migration"), which noted: THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages Nowadays, we do have PMD migration entries, so the comment no longer applies. Let's drop it. [1] https://lore.kernel.org/r/[email protected] [2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs=g@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Fixes: 474098e ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") Signed-off-by: David Hildenbrand <[email protected]> Reported-by: liubo <[email protected]> Closes: https://lore.kernel.org/r/[email protected] Reported-by: Peter Xu <[email protected]> Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/ Acked-by: Mel Gorman <[email protected]> Acked-by: Peter Xu <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Shuah Khan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 5f1fc67 commit d74943a

File tree

4 files changed

+49
-14
lines changed

4 files changed

+49
-14
lines changed

include/linux/mm.h

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3421,15 +3421,24 @@ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
34213421
* Indicates whether GUP can follow a PROT_NONE mapped page, or whether
34223422
* a (NUMA hinting) fault is required.
34233423
*/
3424-
static inline bool gup_can_follow_protnone(unsigned int flags)
3424+
static inline bool gup_can_follow_protnone(struct vm_area_struct *vma,
3425+
unsigned int flags)
34253426
{
34263427
/*
3427-
* FOLL_FORCE has to be able to make progress even if the VMA is
3428-
* inaccessible. Further, FOLL_FORCE access usually does not represent
3429-
* application behaviour and we should avoid triggering NUMA hinting
3430-
* faults.
3428+
* If callers don't want to honor NUMA hinting faults, no need to
3429+
* determine if we would actually have to trigger a NUMA hinting fault.
34313430
*/
3432-
return flags & FOLL_FORCE;
3431+
if (!(flags & FOLL_HONOR_NUMA_FAULT))
3432+
return true;
3433+
3434+
/*
3435+
* NUMA hinting faults don't apply in inaccessible (PROT_NONE) VMAs.
3436+
*
3437+
* Requiring a fault here even for inaccessible VMAs would mean that
3438+
* FOLL_FORCE cannot make any progress, because handle_mm_fault()
3439+
* refuses to process NUMA hinting faults in inaccessible VMAs.
3440+
*/
3441+
return !vma_is_accessible(vma);
34333442
}
34343443

34353444
typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data);

include/linux/mm_types.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1286,6 +1286,15 @@ enum {
12861286
FOLL_PCI_P2PDMA = 1 << 10,
12871287
/* allow interrupts from generic signals */
12881288
FOLL_INTERRUPTIBLE = 1 << 11,
1289+
/*
1290+
* Always honor (trigger) NUMA hinting faults.
1291+
*
1292+
* FOLL_WRITE implicitly honors NUMA hinting faults because a
1293+
* PROT_NONE-mapped page is not writable (exceptions with FOLL_FORCE
1294+
* apply). get_user_pages_fast_only() always implicitly honors NUMA
1295+
* hinting faults.
1296+
*/
1297+
FOLL_HONOR_NUMA_FAULT = 1 << 12,
12891298

12901299
/* See also internal only FOLL flags in mm/internal.h */
12911300
};

mm/gup.c

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -597,7 +597,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
597597
pte = ptep_get(ptep);
598598
if (!pte_present(pte))
599599
goto no_page;
600-
if (pte_protnone(pte) && !gup_can_follow_protnone(flags))
600+
if (pte_protnone(pte) && !gup_can_follow_protnone(vma, flags))
601601
goto no_page;
602602

603603
page = vm_normal_page(vma, address, pte);
@@ -714,7 +714,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
714714
if (likely(!pmd_trans_huge(pmdval)))
715715
return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
716716

717-
if (pmd_protnone(pmdval) && !gup_can_follow_protnone(flags))
717+
if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags))
718718
return no_page_table(vma, flags);
719719

720720
ptl = pmd_lock(mm, pmd);
@@ -851,6 +851,10 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
851851
if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
852852
return NULL;
853853

854+
/*
855+
* We never set FOLL_HONOR_NUMA_FAULT because callers don't expect
856+
* to fail on PROT_NONE-mapped pages.
857+
*/
854858
page = follow_page_mask(vma, address, foll_flags, &ctx);
855859
if (ctx.pgmap)
856860
put_dev_pagemap(ctx.pgmap);
@@ -2227,6 +2231,13 @@ static bool is_valid_gup_args(struct page **pages, int *locked,
22272231
gup_flags |= FOLL_UNLOCKABLE;
22282232
}
22292233

2234+
/*
2235+
* For now, always trigger NUMA hinting faults. Some GUP users like
2236+
* KVM require the hint to be as the calling context of GUP is
2237+
* functionally similar to a memory reference from task context.
2238+
*/
2239+
gup_flags |= FOLL_HONOR_NUMA_FAULT;
2240+
22302241
/* FOLL_GET and FOLL_PIN are mutually exclusive. */
22312242
if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) ==
22322243
(FOLL_PIN | FOLL_GET)))
@@ -2551,7 +2562,14 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
25512562
struct page *page;
25522563
struct folio *folio;
25532564

2554-
if (pte_protnone(pte) && !gup_can_follow_protnone(flags))
2565+
/*
2566+
* Always fallback to ordinary GUP on PROT_NONE-mapped pages:
2567+
* pte_access_permitted() better should reject these pages
2568+
* either way: otherwise, GUP-fast might succeed in
2569+
* cases where ordinary GUP would fail due to VMA access
2570+
* permissions.
2571+
*/
2572+
if (pte_protnone(pte))
25552573
goto pte_unmap;
25562574

25572575
if (!pte_access_permitted(pte, flags & FOLL_WRITE))
@@ -2970,8 +2988,8 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo
29702988

29712989
if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) ||
29722990
pmd_devmap(pmd))) {
2973-
if (pmd_protnone(pmd) &&
2974-
!gup_can_follow_protnone(flags))
2991+
/* See gup_pte_range() */
2992+
if (pmd_protnone(pmd))
29752993
return 0;
29762994

29772995
if (!gup_huge_pmd(pmd, pmdp, addr, next, flags,
@@ -3151,7 +3169,7 @@ static int internal_get_user_pages_fast(unsigned long start,
31513169
if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
31523170
FOLL_FORCE | FOLL_PIN | FOLL_GET |
31533171
FOLL_FAST_ONLY | FOLL_NOFAULT |
3154-
FOLL_PCI_P2PDMA)))
3172+
FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT)))
31553173
return -EINVAL;
31563174

31573175
if (gup_flags & FOLL_PIN)

mm/huge_memory.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1467,8 +1467,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
14671467
if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
14681468
return ERR_PTR(-EFAULT);
14691469

1470-
/* Full NUMA hinting faults to serialise migration in fault paths */
1471-
if (pmd_protnone(*pmd) && !gup_can_follow_protnone(flags))
1470+
if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags))
14721471
return NULL;
14731472

14741473
if (!pmd_write(*pmd) && gup_must_unshare(vma, flags, page))

0 commit comments

Comments
 (0)