Skip to content

Commit 0e4a169

Browse files
kudureranganathPeter Zijlstra
authored andcommitted
sched/fair: Start a cfs_rq on throttled hierarchy with PELT clock throttled
Matteo reported hitting the assert_list_leaf_cfs_rq() warning from enqueue_task_fair() post commit fe8d238 ("sched/fair: Propagate load for throttled cfs_rq") which transitioned to using cfs_rq_pelt_clock_throttled() check for leaf cfs_rq insertions in propagate_entity_cfs_rq(). The "cfs_rq->pelt_clock_throttled" flag is used to indicate if the hierarchy has its PELT frozen. If a cfs_rq's PELT is marked frozen, all its descendants should have their PELT frozen too or weird things can happen as a result of children accumulating PELT signals when the parents have their PELT clock stopped. Another side effect of this is the loss of integrity of the leaf cfs_rq list. As debugged by Aaron, consider the following hierarchy: root(#) / \ A(#) B(*) | C <--- new cgroup | D <--- new cgroup # - Already on leaf cfs_rq list * - Throttled with PELT frozen The newly created cgroups don't have their "pelt_clock_throttled" signal synced with cgroup B. Next, the following series of events occur: 1. online_fair_sched_group() for cgroup D will call propagate_entity_cfs_rq(). (Same can happen if a throttled task is moved to cgroup C and enqueue_task_fair() returns early.) propagate_entity_cfs_rq() adds the cfs_rq of cgroup C to "rq->tmp_alone_branch" since its PELT clock is not marked throttled and cfs_rq of cgroup B is not on the list. cfs_rq of cgroup B is skipped since its PELT is throttled. root cfs_rq already exists on cfs_rq leading to list_add_leaf_cfs_rq() returning early. The cfs_rq of cgroup C is left dangling on the "rq->tmp_alone_branch". 2. A new task wakes up on cgroup A. Since the whole hierarchy is already on the leaf cfs_rq list, list_add_leaf_cfs_rq() keeps returning early without any modifications to "rq->tmp_alone_branch". The final assert_list_leaf_cfs_rq() in enqueue_task_fair() sees the dangling reference to cgroup C's cfs_rq in "rq->tmp_alone_branch". !!! Splat !!! Syncing the "pelt_clock_throttled" indicator with parent cfs_rq is not enough since the new cfs_rq is not yet enqueued on the hierarchy. A dequeue on other subtree on the throttled hierarchy can freeze the PELT clock for the parent hierarchy without setting the indicators for this newly added cfs_rq which was never enqueued. Since there are no tasks on the new hierarchy, start a cfs_rq on a throttled hierarchy with its PELT clock throttled. The first enqueue, or the distribution (whichever happens first) will unfreeze the PELT clock and queue the cfs_rq on the leaf cfs_rq list. While at it, add an assert_list_leaf_cfs_rq() in propagate_entity_cfs_rq() to catch such cases in the future. Closes: https://lore.kernel.org/lkml/[email protected]/ Fixes: e1fad12 ("sched/fair: Switch to task based throttle model") Reported-by: Matteo Martelli <[email protected]> Suggested-by: Aaron Lu <[email protected]> Signed-off-by: K Prateek Nayak <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Aaron Lu <[email protected]> Tested-by: Aaron Lu <[email protected]> Tested-by: Matteo Martelli <[email protected]> Link: https://patch.msgid.link/[email protected]
1 parent 211ddde commit 0e4a169

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

kernel/sched/fair.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6437,6 +6437,16 @@ static void sync_throttle(struct task_group *tg, int cpu)
64376437

64386438
cfs_rq->throttle_count = pcfs_rq->throttle_count;
64396439
cfs_rq->throttled_clock_pelt = rq_clock_pelt(cpu_rq(cpu));
6440+
6441+
/*
6442+
* It is not enough to sync the "pelt_clock_throttled" indicator
6443+
* with the parent cfs_rq when the hierarchy is not queued.
6444+
* Always join a throttled hierarchy with PELT clock throttled
6445+
* and leaf it to the first enqueue, or distribution to
6446+
* unthrottle the PELT clock.
6447+
*/
6448+
if (cfs_rq->throttle_count)
6449+
cfs_rq->pelt_clock_throttled = 1;
64406450
}
64416451

64426452
/* conditionally throttle active cfs_rq's from put_prev_entity() */
@@ -13187,6 +13197,8 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
1318713197
if (!cfs_rq_pelt_clock_throttled(cfs_rq))
1318813198
list_add_leaf_cfs_rq(cfs_rq);
1318913199
}
13200+
13201+
assert_list_leaf_cfs_rq(rq_of(cfs_rq));
1319013202
}
1319113203
#else /* !CONFIG_FAIR_GROUP_SCHED: */
1319213204
static void propagate_entity_cfs_rq(struct sched_entity *se) { }

0 commit comments

Comments
 (0)