Skip to content

Commit d136122

Browse files
author
Peter Zijlstra
committed
sched: Fix race against ptrace_freeze_trace()
There is apparently one site that violates the rule that only current and ttwu() will modify task->state, namely ptrace_{,un}freeze_traced() will change task->state for a remote task. Oleg explains: "TASK_TRACED/TASK_STOPPED was always protected by siglock. In particular, ttwu(__TASK_TRACED) must be always called with siglock held. That is why ptrace_freeze_traced() assumes it can safely do s/TASK_TRACED/__TASK_TRACED/ under spin_lock(siglock)." This breaks the ordering scheme introduced by commit: dbfb089 ("sched: Fix loadavg accounting race") Specifically, the reload not matching no longer implies we don't have to block. Simply things by noting that what we need is a LOAD->STORE ordering and this can be provided by a control dependency. So replace: prev_state = prev->state; raw_spin_lock(&rq->lock); smp_mb__after_spinlock(); /* SMP-MB */ if (... && prev_state && prev_state == prev->state) deactivate_task(); with: prev_state = prev->state; if (... && prev_state) /* CTRL-DEP */ deactivate_task(); Since that already implies the 'prev->state' load must be complete before allowing the 'prev->on_rq = 0' store to become visible. Fixes: dbfb089 ("sched: Fix loadavg accounting race") Reported-by: Jiri Slaby <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Tested-by: Paul Gortmaker <[email protected]> Tested-by: Christian Brauner <[email protected]>
1 parent ba47d84 commit d136122

File tree

1 file changed

+14
-10
lines changed

1 file changed

+14
-10
lines changed

kernel/sched/core.c

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4119,9 +4119,6 @@ static void __sched notrace __schedule(bool preempt)
41194119
local_irq_disable();
41204120
rcu_note_context_switch(preempt);
41214121

4122-
/* See deactivate_task() below. */
4123-
prev_state = prev->state;
4124-
41254122
/*
41264123
* Make sure that signal_pending_state()->signal_pending() below
41274124
* can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
@@ -4145,11 +4142,16 @@ static void __sched notrace __schedule(bool preempt)
41454142
update_rq_clock(rq);
41464143

41474144
switch_count = &prev->nivcsw;
4145+
41484146
/*
4149-
* We must re-load prev->state in case ttwu_remote() changed it
4150-
* before we acquired rq->lock.
4147+
* We must load prev->state once (task_struct::state is volatile), such
4148+
* that:
4149+
*
4150+
* - we form a control dependency vs deactivate_task() below.
4151+
* - ptrace_{,un}freeze_traced() can change ->state underneath us.
41514152
*/
4152-
if (!preempt && prev_state && prev_state == prev->state) {
4153+
prev_state = prev->state;
4154+
if (!preempt && prev_state) {
41534155
if (signal_pending_state(prev_state, prev)) {
41544156
prev->state = TASK_RUNNING;
41554157
} else {
@@ -4163,10 +4165,12 @@ static void __sched notrace __schedule(bool preempt)
41634165

41644166
/*
41654167
* __schedule() ttwu()
4166-
* prev_state = prev->state; if (READ_ONCE(p->on_rq) && ...)
4167-
* LOCK rq->lock goto out;
4168-
* smp_mb__after_spinlock(); smp_acquire__after_ctrl_dep();
4169-
* p->on_rq = 0; p->state = TASK_WAKING;
4168+
* prev_state = prev->state; if (p->on_rq && ...)
4169+
* if (prev_state) goto out;
4170+
* p->on_rq = 0; smp_acquire__after_ctrl_dep();
4171+
* p->state = TASK_WAKING
4172+
*
4173+
* Where __schedule() and ttwu() have matching control dependencies.
41704174
*
41714175
* After this, schedule() must not care about p->state any more.
41724176
*/

0 commit comments

Comments
 (0)