Skip to content

Commit 983a759

Browse files
hansendcgregkh
authored andcommitted
x86/mm/tlb: Revert retpoline avoidance approach
commit d39268a upstream. 0day reported a regression on a microbenchmark which is intended to stress the TLB flushing path: https://lore.kernel.org/all/20220317090415.GE735@xsang-OptiPlex-9020/ It pointed at a commit from Nadav which intended to remove retpoline overhead in the TLB flushing path by taking the 'cond'-ition in on_each_cpu_cond_mask(), pre-calculating it, and incorporating it into 'cpumask'. That allowed the code to use a bunch of earlier direct calls instead of later indirect calls that need a retpoline. But, in practice, threads can go idle (and into lazy TLB mode where they don't need to flush their TLB) between the early and late calls. It works in this direction and not in the other because TLB-flushing threads tend to hold mmap_lock for write. Contention on that lock causes threads to _go_ idle right in this early/late window. There was not any performance data in the original commit specific to the retpoline overhead. I did a few tests on a system with retpolines: https://lore.kernel.org/all/[email protected]/ which showed a possible small win. But, that small win pales in comparison with the bigger loss induced on non-retpoline systems. Revert the patch that removed the retpolines. This was not a clean revert, but it was self-contained enough not to be too painful. Fixes: 6035152 ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()") Reported-by: kernel test robot <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Nadav Amit <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/164874672286.389.7021457716635788197.tip-bot2@tip-bot2 Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 2f67341 commit 983a759

File tree

1 file changed

+5
-32
lines changed

1 file changed

+5
-32
lines changed

arch/x86/mm/tlb.c

Lines changed: 5 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -854,13 +854,11 @@ static void flush_tlb_func(void *info)
854854
nr_invalidate);
855855
}
856856

857-
static bool tlb_is_not_lazy(int cpu)
857+
static bool tlb_is_not_lazy(int cpu, void *data)
858858
{
859859
return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
860860
}
861861

862-
static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask);
863-
864862
DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared);
865863
EXPORT_PER_CPU_SYMBOL(cpu_tlbstate_shared);
866864

@@ -889,36 +887,11 @@ STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask,
889887
* up on the new contents of what used to be page tables, while
890888
* doing a speculative memory access.
891889
*/
892-
if (info->freed_tables) {
890+
if (info->freed_tables)
893891
on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true);
894-
} else {
895-
/*
896-
* Although we could have used on_each_cpu_cond_mask(),
897-
* open-coding it has performance advantages, as it eliminates
898-
* the need for indirect calls or retpolines. In addition, it
899-
* allows to use a designated cpumask for evaluating the
900-
* condition, instead of allocating one.
901-
*
902-
* This code works under the assumption that there are no nested
903-
* TLB flushes, an assumption that is already made in
904-
* flush_tlb_mm_range().
905-
*
906-
* cond_cpumask is logically a stack-local variable, but it is
907-
* more efficient to have it off the stack and not to allocate
908-
* it on demand. Preemption is disabled and this code is
909-
* non-reentrant.
910-
*/
911-
struct cpumask *cond_cpumask = this_cpu_ptr(&flush_tlb_mask);
912-
int cpu;
913-
914-
cpumask_clear(cond_cpumask);
915-
916-
for_each_cpu(cpu, cpumask) {
917-
if (tlb_is_not_lazy(cpu))
918-
__cpumask_set_cpu(cpu, cond_cpumask);
919-
}
920-
on_each_cpu_mask(cond_cpumask, flush_tlb_func, (void *)info, true);
921-
}
892+
else
893+
on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func,
894+
(void *)info, 1, cpumask);
922895
}
923896

924897
void flush_tlb_multi(const struct cpumask *cpumask,

0 commit comments

Comments
 (0)