Skip to content

Commit e21dfbf

Browse files
kvaneeshmpe
authored andcommitted
powerpc/mm/book3s64: Avoid sending IPI on clearing PMD
Now that all the lockless page table walk is careful w.r.t the PTE address returned, we can now revert commit: 13bd817 ("powerpc/thp: Serialize pmd clear against a linux page table walk.") We also drop the equivalent IPI from other pte updates routines. We still keep IPI in hash pmdp collapse and that is to take care of parallel hash page table insert. The radix pmdp collapse flush can possibly be removed once I am sure generic code doesn't have the any expectations around parallel gup walk. This speeds up Qemu guest RAM del/unplug time as below 128 core, 496GB guest: Without patch: munmap start: timer = 13162 ms, PID=7684 munmap finish: timer = 95312 ms, PID=7684 - delta = 82150 ms With patch: munmap start: timer = 196449 ms, PID=6681 munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 0e11df9 commit e21dfbf

File tree

3 files changed

+7
-31
lines changed

3 files changed

+7
-31
lines changed

arch/powerpc/mm/book3s64/hash_pgtable.c

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -363,17 +363,6 @@ pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
363363
* hash fault look at them.
364364
*/
365365
memset(pgtable, 0, PTE_FRAG_SIZE);
366-
/*
367-
* Serialize against find_current_mm_pte variants which does lock-less
368-
* lookup in page tables with local interrupts disabled. For huge pages
369-
* it casts pmd_t to pte_t. Since format of pte_t is different from
370-
* pmd_t we want to prevent transit from pmd pointing to page table
371-
* to pmd pointing to huge page (and back) while interrupts are disabled.
372-
* We clear pmd to possibly replace it with page table pointer in
373-
* different code paths. So make sure we wait for the parallel
374-
* find_curren_mm_pte to finish.
375-
*/
376-
serialize_against_pte_lookup(mm);
377366
return old_pmd;
378367
}
379368

arch/powerpc/mm/book3s64/pgtable.c

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -109,14 +109,6 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
109109

110110
old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID);
111111
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
112-
/*
113-
* This ensures that generic code that rely on IRQ disabling
114-
* to prevent a parallel THP split work as expected.
115-
*
116-
* Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
117-
* a special case check in pmd_access_permitted.
118-
*/
119-
serialize_against_pte_lookup(vma->vm_mm);
120112
return __pmd(old_pmd);
121113
}
122114

arch/powerpc/mm/book3s64/radix_pgtable.c

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -962,7 +962,13 @@ pmd_t radix__pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long addre
962962
pmd = *pmdp;
963963
pmd_clear(pmdp);
964964

965-
/*FIXME!! Verify whether we need this kick below */
965+
/*
966+
* pmdp collapse_flush need to ensure that there are no parallel gup
967+
* walk after this call. This is needed so that we can have stable
968+
* page ref count when collapsing a page. We don't allow a collapse page
969+
* if we have gup taken on the page. We can ensure that by sending IPI
970+
* because gup walk happens with IRQ disabled.
971+
*/
966972
serialize_against_pte_lookup(vma->vm_mm);
967973

968974
radix__flush_tlb_collapsed_pmd(vma->vm_mm, address);
@@ -1023,17 +1029,6 @@ pmd_t radix__pmdp_huge_get_and_clear(struct mm_struct *mm,
10231029

10241030
old = radix__pmd_hugepage_update(mm, addr, pmdp, ~0UL, 0);
10251031
old_pmd = __pmd(old);
1026-
/*
1027-
* Serialize against find_current_mm_pte which does lock-less
1028-
* lookup in page tables with local interrupts disabled. For huge pages
1029-
* it casts pmd_t to pte_t. Since format of pte_t is different from
1030-
* pmd_t we want to prevent transit from pmd pointing to page table
1031-
* to pmd pointing to huge page (and back) while interrupts are disabled.
1032-
* We clear pmd to possibly replace it with page table pointer in
1033-
* different code paths. So make sure we wait for the parallel
1034-
* find_current_mm_pte to finish.
1035-
*/
1036-
serialize_against_pte_lookup(mm);
10371032
return old_pmd;
10381033
}
10391034

0 commit comments

Comments
 (0)