Skip to content

Commit 5071228

Browse files
committed
proc: Ensure we see the exit of each process tid exactly
In the work to remove proc_mnt I noticed that we were calling proc_flush_task now proc_flush_pid possibly multiple times for the same pid because of how de_thread works. This is a bare minimal patchset to sort out de_thread, by introducing exchange_tids and the helper of exchange_tids hlists_swap_heads_rcu. The actual call of exchange_tids should be slowpath so I have prioritized readability over getting every last drop of performance. I have also read through a bunch of the code to see if I could find anything that would be affected by this change. Users of has_group_leader_pid were a good canidates. But I also looked at other cases that might have a pid->task->pid transition. I ignored other sources of races with de_thread and exec as those are preexisting. I found a close call with send_signals user of task_active_pid_ns, but all pids of a thread group are guaranteeds to be in the same pid namespace so there is not a problem. I found a few pieces of debugging code that do: task = pid_task(pid, PIDTYPE_PID); if (task) { printk("%u\n", task->pid); } But I can't see how we care if it happens at the wrong moment that task->pid might not match pid_nr(pid); Similarly because the code in posix-cpu-timers goes pid->task->pid it feels like there should be a problem. But as the code that works with PIDTYPE_PID is only available within the thread group, and as de_thread kills all of the other threads before it makes any changes of this kind the race can not happen. In short I don't think this change will introduce any regressions. Eric W. Biederman (2): rculist: Add hlists_swap_heads_rcu proc: Ensure we see the exit of each process tid exactly once fs/exec.c | 5 +---- include/linux/pid.h | 1 + include/linux/rculist.h | 21 +++++++++++++++++++++ kernel/pid.c | 19 +++++++++++++++++++ 4 files changed, 42 insertions(+), 4 deletions(-) Link: https://lore.kernel.org/lkml/[email protected]/ Acked-by: Linus Torvalds <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2 parents 3147d8a + 6b03d13 commit 5071228

File tree

4 files changed

+42
-4
lines changed

4 files changed

+42
-4
lines changed

fs/exec.c

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1186,11 +1186,8 @@ static int de_thread(struct task_struct *tsk)
11861186

11871187
/* Become a process group leader with the old leader's pid.
11881188
* The old leader becomes a thread of the this thread group.
1189-
* Note: The old leader also uses this pid until release_task
1190-
* is called. Odd but simple and correct.
11911189
*/
1192-
tsk->pid = leader->pid;
1193-
change_pid(tsk, PIDTYPE_PID, task_pid(leader));
1190+
exchange_tids(tsk, leader);
11941191
transfer_pid(leader, tsk, PIDTYPE_TGID);
11951192
transfer_pid(leader, tsk, PIDTYPE_PGID);
11961193
transfer_pid(leader, tsk, PIDTYPE_SID);

include/linux/pid.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ extern void attach_pid(struct task_struct *task, enum pid_type);
102102
extern void detach_pid(struct task_struct *task, enum pid_type);
103103
extern void change_pid(struct task_struct *task, enum pid_type,
104104
struct pid *pid);
105+
extern void exchange_tids(struct task_struct *task, struct task_struct *old);
105106
extern void transfer_pid(struct task_struct *old, struct task_struct *new,
106107
enum pid_type);
107108

include/linux/rculist.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -506,6 +506,27 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
506506
WRITE_ONCE(old->pprev, LIST_POISON2);
507507
}
508508

509+
/**
510+
* hlists_swap_heads_rcu - swap the lists the hlist heads point to
511+
* @left: The hlist head on the left
512+
* @right: The hlist head on the right
513+
*
514+
* The lists start out as [@left ][node1 ... ] and
515+
[@right ][node2 ... ]
516+
* The lists end up as [@left ][node2 ... ]
517+
* [@right ][node1 ... ]
518+
*/
519+
static inline void hlists_swap_heads_rcu(struct hlist_head *left, struct hlist_head *right)
520+
{
521+
struct hlist_node *node1 = left->first;
522+
struct hlist_node *node2 = right->first;
523+
524+
rcu_assign_pointer(left->first, node2);
525+
rcu_assign_pointer(right->first, node1);
526+
WRITE_ONCE(node2->pprev, &left->first);
527+
WRITE_ONCE(node1->pprev, &right->first);
528+
}
529+
509530
/*
510531
* return the first or the next element in an RCU protected hlist
511532
*/

kernel/pid.c

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -363,6 +363,25 @@ void change_pid(struct task_struct *task, enum pid_type type,
363363
attach_pid(task, type);
364364
}
365365

366+
void exchange_tids(struct task_struct *left, struct task_struct *right)
367+
{
368+
struct pid *pid1 = left->thread_pid;
369+
struct pid *pid2 = right->thread_pid;
370+
struct hlist_head *head1 = &pid1->tasks[PIDTYPE_PID];
371+
struct hlist_head *head2 = &pid2->tasks[PIDTYPE_PID];
372+
373+
/* Swap the single entry tid lists */
374+
hlists_swap_heads_rcu(head1, head2);
375+
376+
/* Swap the per task_struct pid */
377+
rcu_assign_pointer(left->thread_pid, pid2);
378+
rcu_assign_pointer(right->thread_pid, pid1);
379+
380+
/* Swap the cached value */
381+
WRITE_ONCE(left->pid, pid_nr(pid2));
382+
WRITE_ONCE(right->pid, pid_nr(pid1));
383+
}
384+
366385
/* transfer_pid is an optimization of attach_pid(new), detach_pid(old) */
367386
void transfer_pid(struct task_struct *old, struct task_struct *new,
368387
enum pid_type type)

0 commit comments

Comments
 (0)