Skip to content

Commit a4103ea

Browse files
committed
sched_ext: Add a cgroup scheduler which uses flattened hierarchy
This patch adds scx_flatcg example scheduler which implements hierarchical weight-based cgroup CPU control by flattening the cgroup hierarchy into a single layer by compounding the active weight share at each level. This flattening of hierarchy can bring a substantial performance gain when the cgroup hierarchy is nested multiple levels. in a simple benchmark using wrk[8] on apache serving a CGI script calculating sha1sum of a small file, it outperforms CFS by ~3% with CPU controller disabled and by ~10% with two apache instances competing with 2:1 weight ratio nested four level deep. However, the gain comes at the cost of not being able to properly handle thundering herd of cgroups. For example, if many cgroups which are nested behind a low priority parent cgroup wake up around the same time, they may be able to consume more CPU cycles than they are entitled to. In many use cases, this isn't a real concern especially given the performance gain. Also, there are ways to mitigate the problem further by e.g. introducing an extra scheduling layer on cgroup delegation boundaries. v5: - Updated to specify SCX_OPS_HAS_CGROUP_WEIGHT instead of SCX_OPS_KNOB_CGROUP_WEIGHT. v4: - Revert reference counted kptr for cgv_node as the change caused easily reproducible stalls. v3: - Updated to reflect the core API changes including ops.init/exit_task() and direct dispatch from ops.select_cpu(). Fixes and improvements including additional statistics. - Use reference counted kptr for cgv_node instead of xchg'ing against stash location. - Dropped '-p' option. v2: - Use SCX_BUG[_ON]() to simplify error handling. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: David Vernet <[email protected]> Acked-by: Josh Don <[email protected]> Acked-by: Hao Luo <[email protected]> Acked-by: Barret Rhoden <[email protected]>
1 parent 8195136 commit a4103ea

File tree

5 files changed

+1246
-1
lines changed

5 files changed

+1246
-1
lines changed

tools/sched_ext/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ $(INCLUDE_DIR)/%.bpf.skel.h: $(SCXOBJ_DIR)/%.bpf.o $(INCLUDE_DIR)/vmlinux.h $(BP
176176

177177
SCX_COMMON_DEPS := include/scx/common.h include/scx/user_exit_info.h | $(BINDIR)
178178

179-
c-sched-targets = scx_simple scx_qmap scx_central
179+
c-sched-targets = scx_simple scx_qmap scx_central scx_flatcg
180180

181181
$(addprefix $(BINDIR)/,$(c-sched-targets)): \
182182
$(BINDIR)/%: \

tools/sched_ext/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,18 @@ where this could be particularly useful is running VMs, where running with
192192
infinite slices and no timer ticks allows the VM to avoid unnecessary expensive
193193
vmexits.
194194

195+
## scx_flatcg
196+
197+
A flattened cgroup hierarchy scheduler. This scheduler implements hierarchical
198+
weight-based cgroup CPU control by flattening the cgroup hierarchy into a single
199+
layer, by compounding the active weight share at each level. The effect of this
200+
is a much more performant CPU controller, which does not need to descend down
201+
cgroup trees in order to properly compute a cgroup's share.
202+
203+
Similar to scx_simple, in limited scenarios, this scheduler can perform
204+
reasonably well on single socket-socket systems with a unified L3 cache and show
205+
significantly lowered hierarchical scheduling overhead.
206+
195207

196208
# Troubleshooting
197209

0 commit comments

Comments
 (0)