Skip to content

Commit d6962c4

Browse files
dtccccPeter Zijlstra
authored andcommitted
sched: Clear ttwu_pending after enqueue_task()
We found a long tail latency in schbench whem m*t is close to nr_cpus. (e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.) This is because when the wakee cpu is idle, rq->ttwu_pending is cleared too early, and idle_cpu() will return true until the wakee task enqueued. This will mislead the waker when selecting idle cpu, and wake multiple worker threads on the same wakee cpu. This situation is enlarged by commit f3dd3f6 ("sched: Remove the limitation of WF_ON_CPU on wakelist if wakee cpu is idle") because it tends to use wakelist. Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu (Intel(R) Xeon(R) Platinum 8369B). Latency percentiles (usec): base base+revert_f3dd3f674555 base+this_patch 50.0000th: 9 13 9 75.0000th: 12 19 12 90.0000th: 15 22 15 95.0000th: 18 24 17 *99.0000th: 27 31 24 99.5000th: 3364 33 27 99.9000th: 12560 36 30 We also tested on unixbench and hackbench, and saw no performance change. Signed-off-by: Tianchen Ding <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mel Gorman <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent 52b33d8 commit d6962c4

File tree

1 file changed

+11
-7
lines changed

1 file changed

+11
-7
lines changed

kernel/sched/core.c

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3739,13 +3739,6 @@ void sched_ttwu_pending(void *arg)
37393739
if (!llist)
37403740
return;
37413741

3742-
/*
3743-
* rq::ttwu_pending racy indication of out-standing wakeups.
3744-
* Races such that false-negatives are possible, since they
3745-
* are shorter lived that false-positives would be.
3746-
*/
3747-
WRITE_ONCE(rq->ttwu_pending, 0);
3748-
37493742
rq_lock_irqsave(rq, &rf);
37503743
update_rq_clock(rq);
37513744

@@ -3759,6 +3752,17 @@ void sched_ttwu_pending(void *arg)
37593752
ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
37603753
}
37613754

3755+
/*
3756+
* Must be after enqueueing at least once task such that
3757+
* idle_cpu() does not observe a false-negative -- if it does,
3758+
* it is possible for select_idle_siblings() to stack a number
3759+
* of tasks on this CPU during that window.
3760+
*
3761+
* It is ok to clear ttwu_pending when another task pending.
3762+
* We will receive IPI after local irq enabled and then enqueue it.
3763+
* Since now nr_running > 0, idle_cpu() will always get correct result.
3764+
*/
3765+
WRITE_ONCE(rq->ttwu_pending, 0);
37623766
rq_unlock_irqrestore(rq, &rf);
37633767
}
37643768

0 commit comments

Comments
 (0)