Skip to content

Commit 1fcb7ce

Browse files
Ryan Robertswilldeacon
authored andcommitted
arm64: mm: Batch dsb and isb when populating pgtables
After removing uneccessary TLBIs, the next bottleneck when creating the page tables for the linear map is DSB and ISB, which were previously issued per-pte in __set_pte(). Since we are writing multiple ptes in a given pte table, we can elide these barriers and insert them once we have finished writing to the table. Execution time of map_mem(), which creates the kernel linear map page tables, was measured on different machines with different RAM configs: | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra | VM, 16G | VM, 64G | VM, 256G | Metal, 512G ---------------|-------------|-------------|-------------|------------- | ms (%) | ms (%) | ms (%) | ms (%) ---------------|-------------|-------------|-------------|------------- before | 78 (0%) | 435 (0%) | 1723 (0%) | 3779 (0%) after | 11 (-86%) | 161 (-63%) | 656 (-62%) | 1654 (-56%) Signed-off-by: Ryan Roberts <[email protected]> Tested-by: Itaru Kitayama <[email protected]> Tested-by: Eric Chanudet <[email protected]> Reviewed-by: Mark Rutland <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
1 parent 5c63db5 commit 1fcb7ce

File tree

2 files changed

+16
-2
lines changed

2 files changed

+16
-2
lines changed

arch/arm64/include/asm/pgtable.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -271,9 +271,14 @@ static inline pte_t pte_mkdevmap(pte_t pte)
271271
return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
272272
}
273273

274-
static inline void __set_pte(pte_t *ptep, pte_t pte)
274+
static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
275275
{
276276
WRITE_ONCE(*ptep, pte);
277+
}
278+
279+
static inline void __set_pte(pte_t *ptep, pte_t pte)
280+
{
281+
__set_pte_nosync(ptep, pte);
277282

278283
/*
279284
* Only if the new pte is valid and kernel, otherwise TLB maintenance

arch/arm64/mm/mmu.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,11 @@ static void init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
178178
do {
179179
pte_t old_pte = __ptep_get(ptep);
180180

181-
__set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
181+
/*
182+
* Required barriers to make this visible to the table walker
183+
* are deferred to the end of alloc_init_cont_pte().
184+
*/
185+
__set_pte_nosync(ptep, pfn_pte(__phys_to_pfn(phys), prot));
182186

183187
/*
184188
* After the PTE entry has been populated once, we
@@ -232,6 +236,11 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
232236
phys += next - addr;
233237
} while (addr = next, addr != end);
234238

239+
/*
240+
* Note: barriers and maintenance necessary to clear the fixmap slot
241+
* ensure that all previous pgtable writes are visible to the table
242+
* walker.
243+
*/
235244
pte_clear_fixmap();
236245
}
237246

0 commit comments

Comments
 (0)