Skip to content

Commit facd40a

Browse files
anna-marialxKAGA-KOKO
authored andcommitted
timers/migration: Do not rely always on group->parent
When reading group->parent without holding the group lock it is racy against CPUs coming online the first time and thereby creating another level of the hierarchy. This is not a problem when this value is read once to decide whether to abort a propagation or not. The worst outcome is an unnecessary/early CPU wake up. But it is racy when reading it several times during a single 'action' (like activation, deactivation, checking for remote timer expiry,...) and relying on the consitency of this value without holding the lock. This happens at the moment e.g. in tmigr_inactive_up() which is also calling tmigr_udpate_events(). Code relys on group->parent not to change during this 'action'. Update parent struct member description to explain the above only once. Remove parent pointer checks when they are not mandatory (like update of data->childmask). Remove a warning, which would be nice but the trigger of this warning is not reliable and add expand the data structure member description instead. Expand a comment, why it is safe to rely on parent pointer here (inside hierarchy update). Fixes: 7ee9887 ("timers: Implement the hierarchical pull model") Reported-by: Borislav Petkov <[email protected]> Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 22a40d1 commit facd40a

File tree

2 files changed

+26
-19
lines changed

2 files changed

+26
-19
lines changed

kernel/time/timer_migration.c

Lines changed: 15 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -507,7 +507,14 @@ static void walk_groups(up_f up, void *data, struct tmigr_cpu *tmc)
507507
* (get_next_timer_interrupt())
508508
* @firstexp: Contains the first event expiry information when last
509509
* active CPU of hierarchy is on the way to idle to make
510-
* sure CPU will be back in time.
510+
* sure CPU will be back in time. It is updated in top
511+
* level group only. Be aware, there could occur a new top
512+
* level of the hierarchy between the 'top level call' in
513+
* tmigr_update_events() and the check for the parent group
514+
* in walk_groups(). Then @firstexp might contain a value
515+
* != KTIME_MAX even if it was not the final top
516+
* level. This is not a problem, as the worst outcome is a
517+
* CPU which might wake up a little early.
511518
* @evt: Pointer to tmigr_event which needs to be queued (of idle
512519
* child group)
513520
* @childmask: childmask of child group
@@ -649,7 +656,7 @@ static bool tmigr_active_up(struct tmigr_group *group,
649656

650657
} while (!atomic_try_cmpxchg(&group->migr_state, &curstate.state, newstate.state));
651658

652-
if ((walk_done == false) && group->parent)
659+
if (walk_done == false)
653660
data->childmask = group->childmask;
654661

655662
/*
@@ -1317,20 +1324,9 @@ static bool tmigr_inactive_up(struct tmigr_group *group,
13171324
/* Event Handling */
13181325
tmigr_update_events(group, child, data);
13191326

1320-
if (group->parent && (walk_done == false))
1327+
if (walk_done == false)
13211328
data->childmask = group->childmask;
13221329

1323-
/*
1324-
* data->firstexp was set by tmigr_update_events() and contains the
1325-
* expiry of the first global event which needs to be handled. It
1326-
* differs from KTIME_MAX if:
1327-
* - group is the top level group and
1328-
* - group is idle (which means CPU was the last active CPU in the
1329-
* hierarchy) and
1330-
* - there is a pending event in the hierarchy
1331-
*/
1332-
WARN_ON_ONCE(data->firstexp != KTIME_MAX && group->parent);
1333-
13341330
trace_tmigr_group_set_cpu_inactive(group, newstate, childmask);
13351331

13361332
return walk_done;
@@ -1552,10 +1548,11 @@ static void tmigr_connect_child_parent(struct tmigr_group *child,
15521548
data.childmask = child->childmask;
15531549

15541550
/*
1555-
* There is only one new level per time. When connecting the
1556-
* child and the parent and set the child active when the parent
1557-
* is inactive, the parent needs to be the uppermost
1558-
* level. Otherwise there went something wrong!
1551+
* There is only one new level per time (which is protected by
1552+
* tmigr_mutex). When connecting the child and the parent and
1553+
* set the child active when the parent is inactive, the parent
1554+
* needs to be the uppermost level. Otherwise there went
1555+
* something wrong!
15591556
*/
15601557
WARN_ON(!tmigr_active_up(parent, child, &data) && parent->parent);
15611558
}

kernel/time/timer_migration.h

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,17 @@ struct tmigr_event {
2222
* struct tmigr_group - timer migration hierarchy group
2323
* @lock: Lock protecting the event information and group hierarchy
2424
* information during setup
25-
* @parent: Pointer to the parent group
25+
* @parent: Pointer to the parent group. Pointer is updated when a
26+
* new hierarchy level is added because of a CPU coming
27+
* online the first time. Once it is set, the pointer will
28+
* not be removed or updated. When accessing parent pointer
29+
* lock less to decide whether to abort a propagation or
30+
* not, it is not a problem. The worst outcome is an
31+
* unnecessary/early CPU wake up. But do not access parent
32+
* pointer several times in the same 'action' (like
33+
* activation, deactivation, check for remote expiry,...)
34+
* without holding the lock as it is not ensured that value
35+
* will not change.
2636
* @groupevt: Next event of the group which is only used when the
2737
* group is !active. The group event is then queued into
2838
* the parent timer queue.

0 commit comments

Comments
 (0)