Skip to content

Commit 6753573

Browse files
amlutoIngo Molnar
authored andcommitted
Revert "x86/mm: Stop calling leave_mm() in idle code"
This reverts commit 43858b4. The reason I removed the leave_mm() calls in question is because the heuristic wasn't needed after that patch. With the original version of my PCID series, we never flushed a "lazy cpu" (i.e. a CPU running kernel thread) due a flush on the loaded mm. Unfortunately, that caused architectural issues, so now I've reinstated these flushes on non-PCID systems in: commit b956575 ("x86/mm: Flush more aggressively in lazy TLB mode"). That, in turn, gives us a power management and occasionally performance regression as compared to old kernels: a process that goes into a deep idle state on a given CPU and gets its mm flushed due to activity on a different CPU will wake the idle CPU. Reinstate the old ugly heuristic: if a CPU goes into ACPI C3 or an intel_idle state that is likely to cause a TLB flush gets its mm switched to init_mm before going idle. FWIW, this heuristic is lousy. Whether we should change CR3 before idle isn't a good hint except insofar as the performance hit is a bit lower if the TLB is getting flushed by the idle code anyway. What we really want to know is whether we anticipate being idle long enough that the mm is likely to be flushed before we wake up. This is more a matter of the expected latency than the idle state that gets chosen. This heuristic also completely fails on systems that don't know whether the TLB will be flushed (e.g. AMD systems?). OTOH it may be a bit obsolete anyway -- PCID systems don't presently benefit from this heuristic at all. We also shouldn't do this callback from innermost bit of the idle code due to the RCU nastiness it causes. All the information need is available before rcu_idle_enter() needs to happen. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 43858b4 "x86/mm: Stop calling leave_mm() in idle code" Link: http://lkml.kernel.org/r/c513bbd4e653747213e05bc7062de000bf0202a5.1509793738.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
1 parent 5f47944 commit 6753573

File tree

5 files changed

+25
-7
lines changed

5 files changed

+25
-7
lines changed

arch/ia64/include/asm/acpi.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,8 @@ static inline void arch_acpi_set_pdc_bits(u32 *buf)
112112
buf[2] |= ACPI_PDC_EST_CAPABILITY_SMP;
113113
}
114114

115+
#define acpi_unlazy_tlb(x)
116+
115117
#ifdef CONFIG_ACPI_NUMA
116118
extern cpumask_t early_cpu_possible_map;
117119
#define for_each_possible_early_cpu(cpu) \

arch/x86/include/asm/acpi.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,8 @@ static inline void disable_acpi(void) { }
150150
extern int x86_acpi_numa_init(void);
151151
#endif /* CONFIG_ACPI_NUMA */
152152

153+
#define acpi_unlazy_tlb(x) leave_mm(x)
154+
153155
#ifdef CONFIG_ACPI_APEI
154156
static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
155157
{

arch/x86/mm/tlb.c

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@ void leave_mm(int cpu)
8585

8686
switch_mm(NULL, &init_mm, NULL);
8787
}
88+
EXPORT_SYMBOL_GPL(leave_mm);
8889

8990
void switch_mm(struct mm_struct *prev, struct mm_struct *next,
9091
struct task_struct *tsk)
@@ -195,12 +196,22 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
195196
this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
196197
this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
197198
write_cr3(build_cr3(next, new_asid));
198-
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
199-
TLB_FLUSH_ALL);
199+
200+
/*
201+
* NB: This gets called via leave_mm() in the idle path
202+
* where RCU functions differently. Tracing normally
203+
* uses RCU, so we need to use the _rcuidle variant.
204+
*
205+
* (There is no good reason for this. The idle code should
206+
* be rearranged to call this before rcu_idle_enter().)
207+
*/
208+
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
200209
} else {
201210
/* The new ASID is already up to date. */
202211
write_cr3(build_cr3_noflush(next, new_asid));
203-
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
212+
213+
/* See above wrt _rcuidle. */
214+
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
204215
}
205216

206217
this_cpu_write(cpu_tlbstate.loaded_mm, next);

drivers/acpi/processor_idle.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -710,6 +710,8 @@ static DEFINE_RAW_SPINLOCK(c3_lock);
710710
static void acpi_idle_enter_bm(struct acpi_processor *pr,
711711
struct acpi_processor_cx *cx, bool timer_bc)
712712
{
713+
acpi_unlazy_tlb(smp_processor_id());
714+
713715
/*
714716
* Must be done before busmaster disable as we might need to
715717
* access HPET !

drivers/idle/intel_idle.c

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -913,15 +913,16 @@ static __cpuidle int intel_idle(struct cpuidle_device *dev,
913913
struct cpuidle_state *state = &drv->states[index];
914914
unsigned long eax = flg2MWAIT(state->flags);
915915
unsigned int cstate;
916+
int cpu = smp_processor_id();
916917

917918
cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;
918919

919920
/*
920-
* NB: if CPUIDLE_FLAG_TLB_FLUSHED is set, this idle transition
921-
* will probably flush the TLB. It's not guaranteed to flush
922-
* the TLB, though, so it's not clear that we can do anything
923-
* useful with this knowledge.
921+
* leave_mm() to avoid costly and often unnecessary wakeups
922+
* for flushing the user TLB's associated with the active mm.
924923
*/
924+
if (state->flags & CPUIDLE_FLAG_TLB_FLUSHED)
925+
leave_mm(cpu);
925926

926927
if (!(lapic_timer_reliable_states & (1 << (cstate))))
927928
tick_broadcast_enter();

0 commit comments

Comments
 (0)