Skip to content

Commit 4929a4e

Browse files
Xuewei ZhangIngo Molnar
authored andcommitted
sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision
The quota/period ratio is used to ensure a child task group won't get more bandwidth than the parent task group, and is calculated as: normalized_cfs_quota() = [(quota_us << 20) / period_us] If the quota/period ratio was changed during this scaling due to precision loss, it will cause inconsistency between parent and child task groups. See below example: A userspace container manager (kubelet) does three operations: 1) Create a parent cgroup, set quota to 1,000us and period to 10,000us. 2) Create a few children cgroups. 3) Set quota to 1,000us and period to 10,000us on a child cgroup. These operations are expected to succeed. However, if the scaling of 147/128 happens before step 3, quota and period of the parent cgroup will be changed: new_quota: 1148437ns, 1148us new_period: 11484375ns, 11484us And when step 3 comes in, the ratio of the child cgroup will be 104857, which will be larger than the parent cgroup ratio (104821), and will fail. Scaling them by a factor of 2 will fix the problem. Tested-by: Phil Auld <[email protected]> Signed-off-by: Xuewei Zhang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Phil Auld <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Ben Segall <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vincent Guittot <[email protected]> Fixes: 2e8e192 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
1 parent 73956fc commit 4929a4e

File tree

1 file changed

+22
-14
lines changed

1 file changed

+22
-14
lines changed

kernel/sched/fair.c

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4926,20 +4926,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
49264926
if (++count > 3) {
49274927
u64 new, old = ktime_to_ns(cfs_b->period);
49284928

4929-
new = (old * 147) / 128; /* ~115% */
4930-
new = min(new, max_cfs_quota_period);
4931-
4932-
cfs_b->period = ns_to_ktime(new);
4933-
4934-
/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
4935-
cfs_b->quota *= new;
4936-
cfs_b->quota = div64_u64(cfs_b->quota, old);
4937-
4938-
pr_warn_ratelimited(
4939-
"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
4940-
smp_processor_id(),
4941-
div_u64(new, NSEC_PER_USEC),
4942-
div_u64(cfs_b->quota, NSEC_PER_USEC));
4929+
/*
4930+
* Grow period by a factor of 2 to avoid losing precision.
4931+
* Precision loss in the quota/period ratio can cause __cfs_schedulable
4932+
* to fail.
4933+
*/
4934+
new = old * 2;
4935+
if (new < max_cfs_quota_period) {
4936+
cfs_b->period = ns_to_ktime(new);
4937+
cfs_b->quota *= 2;
4938+
4939+
pr_warn_ratelimited(
4940+
"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us = %lld, cfs_quota_us = %lld)\n",
4941+
smp_processor_id(),
4942+
div_u64(new, NSEC_PER_USEC),
4943+
div_u64(cfs_b->quota, NSEC_PER_USEC));
4944+
} else {
4945+
pr_warn_ratelimited(
4946+
"cfs_period_timer[cpu%d]: period too short, but cannot scale up without losing precision (cfs_period_us = %lld, cfs_quota_us = %lld)\n",
4947+
smp_processor_id(),
4948+
div_u64(old, NSEC_PER_USEC),
4949+
div_u64(cfs_b->quota, NSEC_PER_USEC));
4950+
}
49434951

49444952
/* reset count so we don't come right back in here */
49454953
count = 0;

0 commit comments

Comments
 (0)