Skip to content

Commit e45cdc7

Browse files
amlutoKAGA-KOKO
authored andcommitted
membarrier: Execute SYNC_CORE on the calling thread
membarrier()'s MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is documented as syncing the core on all sibling threads but not necessarily the calling thread. This behavior is fundamentally buggy and cannot be used safely. Suppose a user program has two threads. Thread A is on CPU 0 and thread B is on CPU 1. Thread A modifies some text and calls membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE). Then thread B executes the modified code. If, at any point after membarrier() decides which CPUs to target, thread A could be preempted and replaced by thread B on CPU 0. This could even happen on exit from the membarrier() syscall. If this happens, thread B will end up running on CPU 0 without having synced. In principle, this could be fixed by arranging for the scheduler to issue sync_core_before_usermode() whenever switching between two threads in the same mm if there is any possibility of a concurrent membarrier() call, but this would have considerable overhead. Instead, make membarrier() sync the calling CPU as well. As an optimization, this avoids an extra smp_mb() in the default barrier-only mode and an extra rseq preempt on the caller. Fixes: 70216e1 ("membarrier: Provide core serializing command, *_SYNC_CORE") Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/r/250ded637696d490c69bef1877148db86066881c.1607058304.git.luto@kernel.org
1 parent 758c937 commit e45cdc7

File tree

1 file changed

+33
-18
lines changed

1 file changed

+33
-18
lines changed

kernel/sched/membarrier.c

Lines changed: 33 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,8 @@ static int membarrier_private_expedited(int flags, int cpu_id)
194194
return -EPERM;
195195
}
196196

197-
if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
197+
if (flags != MEMBARRIER_FLAG_SYNC_CORE &&
198+
(atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1))
198199
return 0;
199200

200201
/*
@@ -213,8 +214,6 @@ static int membarrier_private_expedited(int flags, int cpu_id)
213214

214215
if (cpu_id >= nr_cpu_ids || !cpu_online(cpu_id))
215216
goto out;
216-
if (cpu_id == raw_smp_processor_id())
217-
goto out;
218217
rcu_read_lock();
219218
p = rcu_dereference(cpu_rq(cpu_id)->curr);
220219
if (!p || p->mm != mm) {
@@ -229,29 +228,45 @@ static int membarrier_private_expedited(int flags, int cpu_id)
229228
for_each_online_cpu(cpu) {
230229
struct task_struct *p;
231230

232-
/*
233-
* Skipping the current CPU is OK even through we can be
234-
* migrated at any point. The current CPU, at the point
235-
* where we read raw_smp_processor_id(), is ensured to
236-
* be in program order with respect to the caller
237-
* thread. Therefore, we can skip this CPU from the
238-
* iteration.
239-
*/
240-
if (cpu == raw_smp_processor_id())
241-
continue;
242231
p = rcu_dereference(cpu_rq(cpu)->curr);
243232
if (p && p->mm == mm)
244233
__cpumask_set_cpu(cpu, tmpmask);
245234
}
246235
rcu_read_unlock();
247236
}
248237

249-
preempt_disable();
250-
if (cpu_id >= 0)
238+
if (cpu_id >= 0) {
239+
/*
240+
* smp_call_function_single() will call ipi_func() if cpu_id
241+
* is the calling CPU.
242+
*/
251243
smp_call_function_single(cpu_id, ipi_func, NULL, 1);
252-
else
253-
smp_call_function_many(tmpmask, ipi_func, NULL, 1);
254-
preempt_enable();
244+
} else {
245+
/*
246+
* For regular membarrier, we can save a few cycles by
247+
* skipping the current cpu -- we're about to do smp_mb()
248+
* below, and if we migrate to a different cpu, this cpu
249+
* and the new cpu will execute a full barrier in the
250+
* scheduler.
251+
*
252+
* For SYNC_CORE, we do need a barrier on the current cpu --
253+
* otherwise, if we are migrated and replaced by a different
254+
* task in the same mm just before, during, or after
255+
* membarrier, we will end up with some thread in the mm
256+
* running without a core sync.
257+
*
258+
* For RSEQ, don't rseq_preempt() the caller. User code
259+
* is not supposed to issue syscalls at all from inside an
260+
* rseq critical section.
261+
*/
262+
if (flags != MEMBARRIER_FLAG_SYNC_CORE) {
263+
preempt_disable();
264+
smp_call_function_many(tmpmask, ipi_func, NULL, true);
265+
preempt_enable();
266+
} else {
267+
on_each_cpu_mask(tmpmask, ipi_func, NULL, true);
268+
}
269+
}
255270

256271
out:
257272
if (cpu_id < 0)

0 commit comments

Comments
 (0)