Skip to content

Commit 5a6a09e

Browse files
committed
Merge tag 'cgroup-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo: - cpuset now supports remote partitions where CPUs can be reserved for exclusive use down the tree without requiring all the intermediate nodes to be partitions. This makes it easier to use partitions without modifying existing cgroup hierarchy. - cpuset partition configuration behavior improvement - cgroup_favordynmods= boot param added to allow setting the flag on boot on cgroup1 - Misc code and doc updates * tag 'cgroup-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: docs/cgroup: Add the list of threaded controllers to cgroup-v2.rst cgroup: use legacy_name for cgroup v1 disable info cgroup/cpuset: Cleanup signedness issue in cpu_exclusive_check() cgroup/cpuset: Enable invalid to valid local partition transition cgroup: add cgroup_favordynmods= command-line option cgroup/cpuset: Extend test_cpuset_prs.sh to test remote partition cgroup/cpuset: Documentation update for partition cgroup/cpuset: Check partition conflict with housekeeping setup cgroup/cpuset: Introduce remote partition cgroup/cpuset: Add cpuset.cpus.exclusive for v2 cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2 cgroup/cpuset: Fix load balance state in update_partition_sd_lb() cgroup: Avoid extra dereference in css_populate_dir() cgroup: Check for ret during cgroup1_base_files cft addition
2 parents 866b887 + a41796b commit 5a6a09e

File tree

5 files changed

+1428
-509
lines changed

5 files changed

+1428
-509
lines changed

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 98 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,13 @@ constraint, a threaded controller must be able to handle competition
364364
between threads in a non-leaf cgroup and its child cgroups. Each
365365
threaded controller defines how such competitions are handled.
366366

367+
Currently, the following controllers are threaded and can be enabled
368+
in a threaded cgroup::
369+
370+
- cpu
371+
- cpuset
372+
- perf_event
373+
- pids
367374

368375
[Un]populated Notification
369376
--------------------------
@@ -2226,6 +2233,49 @@ Cpuset Interface Files
22262233

22272234
Its value will be affected by memory nodes hotplug events.
22282235

2236+
cpuset.cpus.exclusive
2237+
A read-write multiple values file which exists on non-root
2238+
cpuset-enabled cgroups.
2239+
2240+
It lists all the exclusive CPUs that are allowed to be used
2241+
to create a new cpuset partition. Its value is not used
2242+
unless the cgroup becomes a valid partition root. See the
2243+
"cpuset.cpus.partition" section below for a description of what
2244+
a cpuset partition is.
2245+
2246+
When the cgroup becomes a partition root, the actual exclusive
2247+
CPUs that are allocated to that partition are listed in
2248+
"cpuset.cpus.exclusive.effective" which may be different
2249+
from "cpuset.cpus.exclusive". If "cpuset.cpus.exclusive"
2250+
has previously been set, "cpuset.cpus.exclusive.effective"
2251+
is always a subset of it.
2252+
2253+
Users can manually set it to a value that is different from
2254+
"cpuset.cpus". The only constraint in setting it is that the
2255+
list of CPUs must be exclusive with respect to its sibling.
2256+
2257+
For a parent cgroup, any one of its exclusive CPUs can only
2258+
be distributed to at most one of its child cgroups. Having an
2259+
exclusive CPU appearing in two or more of its child cgroups is
2260+
not allowed (the exclusivity rule). A value that violates the
2261+
exclusivity rule will be rejected with a write error.
2262+
2263+
The root cgroup is a partition root and all its available CPUs
2264+
are in its exclusive CPU set.
2265+
2266+
cpuset.cpus.exclusive.effective
2267+
A read-only multiple values file which exists on all non-root
2268+
cpuset-enabled cgroups.
2269+
2270+
This file shows the effective set of exclusive CPUs that
2271+
can be used to create a partition root. The content of this
2272+
file will always be a subset of "cpuset.cpus" and its parent's
2273+
"cpuset.cpus.exclusive.effective" if its parent is not the root
2274+
cgroup. It will also be a subset of "cpuset.cpus.exclusive"
2275+
if it is set. If "cpuset.cpus.exclusive" is not set, it is
2276+
treated to have an implicit value of "cpuset.cpus" in the
2277+
formation of local partition.
2278+
22292279
cpuset.cpus.partition
22302280
A read-write single value file which exists on non-root
22312281
cpuset-enabled cgroups. This flag is owned by the parent cgroup
@@ -2239,26 +2289,41 @@ Cpuset Interface Files
22392289
"isolated" Partition root without load balancing
22402290
========== =====================================
22412291

2242-
The root cgroup is always a partition root and its state
2243-
cannot be changed. All other non-root cgroups start out as
2244-
"member".
2292+
A cpuset partition is a collection of cpuset-enabled cgroups with
2293+
a partition root at the top of the hierarchy and its descendants
2294+
except those that are separate partition roots themselves and
2295+
their descendants. A partition has exclusive access to the
2296+
set of exclusive CPUs allocated to it. Other cgroups outside
2297+
of that partition cannot use any CPUs in that set.
2298+
2299+
There are two types of partitions - local and remote. A local
2300+
partition is one whose parent cgroup is also a valid partition
2301+
root. A remote partition is one whose parent cgroup is not a
2302+
valid partition root itself. Writing to "cpuset.cpus.exclusive"
2303+
is optional for the creation of a local partition as its
2304+
"cpuset.cpus.exclusive" file will assume an implicit value that
2305+
is the same as "cpuset.cpus" if it is not set. Writing the
2306+
proper "cpuset.cpus.exclusive" values down the cgroup hierarchy
2307+
before the target partition root is mandatory for the creation
2308+
of a remote partition.
2309+
2310+
Currently, a remote partition cannot be created under a local
2311+
partition. All the ancestors of a remote partition root except
2312+
the root cgroup cannot be a partition root.
2313+
2314+
The root cgroup is always a partition root and its state cannot
2315+
be changed. All other non-root cgroups start out as "member".
22452316

22462317
When set to "root", the current cgroup is the root of a new
2247-
partition or scheduling domain that comprises itself and all
2248-
its descendants except those that are separate partition roots
2249-
themselves and their descendants.
2318+
partition or scheduling domain. The set of exclusive CPUs is
2319+
determined by the value of its "cpuset.cpus.exclusive.effective".
22502320

2251-
When set to "isolated", the CPUs in that partition root will
2321+
When set to "isolated", the CPUs in that partition will
22522322
be in an isolated state without any load balancing from the
22532323
scheduler. Tasks placed in such a partition with multiple
22542324
CPUs should be carefully distributed and bound to each of the
22552325
individual CPUs for optimal performance.
22562326

2257-
The value shown in "cpuset.cpus.effective" of a partition root
2258-
is the CPUs that the partition root can dedicate to a potential
2259-
new child partition root. The new child subtracts available
2260-
CPUs from its parent "cpuset.cpus.effective".
2261-
22622327
A partition root ("root" or "isolated") can be in one of the
22632328
two possible states - valid or invalid. An invalid partition
22642329
root is in a degraded state where some state information may
@@ -2281,37 +2346,33 @@ Cpuset Interface Files
22812346
In the case of an invalid partition root, a descriptive string on
22822347
why the partition is invalid is included within parentheses.
22832348

2284-
For a partition root to become valid, the following conditions
2349+
For a local partition root to be valid, the following conditions
22852350
must be met.
22862351

2287-
1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
2288-
are not shared by any of its siblings (exclusivity rule).
2289-
2) The parent cgroup is a valid partition root.
2290-
3) The "cpuset.cpus" is not empty and must contain at least
2291-
one of the CPUs from parent's "cpuset.cpus", i.e. they overlap.
2292-
4) The "cpuset.cpus.effective" cannot be empty unless there is
2352+
1) The parent cgroup is a valid partition root.
2353+
2) The "cpuset.cpus.exclusive.effective" file cannot be empty,
2354+
though it may contain offline CPUs.
2355+
3) The "cpuset.cpus.effective" cannot be empty unless there is
22932356
no task associated with this partition.
22942357

2295-
External events like hotplug or changes to "cpuset.cpus" can
2296-
cause a valid partition root to become invalid and vice versa.
2297-
Note that a task cannot be moved to a cgroup with empty
2298-
"cpuset.cpus.effective".
2358+
For a remote partition root to be valid, all the above conditions
2359+
except the first one must be met.
22992360

2300-
For a valid partition root with the sibling cpu exclusivity
2301-
rule enabled, changes made to "cpuset.cpus" that violate the
2302-
exclusivity rule will invalidate the partition as well as its
2303-
sibling partitions with conflicting cpuset.cpus values. So
2304-
care must be taking in changing "cpuset.cpus".
2361+
External events like hotplug or changes to "cpuset.cpus" or
2362+
"cpuset.cpus.exclusive" can cause a valid partition root to
2363+
become invalid and vice versa. Note that a task cannot be
2364+
moved to a cgroup with empty "cpuset.cpus.effective".
23052365

23062366
A valid non-root parent partition may distribute out all its CPUs
2307-
to its child partitions when there is no task associated with it.
2367+
to its child local partitions when there is no task associated
2368+
with it.
23082369

2309-
Care must be taken to change a valid partition root to
2310-
"member" as all its child partitions, if present, will become
2370+
Care must be taken to change a valid partition root to "member"
2371+
as all its child local partitions, if present, will become
23112372
invalid causing disruption to tasks running in those child
23122373
partitions. These inactivated partitions could be recovered if
23132374
their parent is switched back to a partition root with a proper
2314-
set of "cpuset.cpus".
2375+
value in "cpuset.cpus" or "cpuset.cpus.exclusive".
23152376

23162377
Poll and inotify events are triggered whenever the state of
23172378
"cpuset.cpus.partition" changes. That includes changes caused
@@ -2321,6 +2382,11 @@ Cpuset Interface Files
23212382
to "cpuset.cpus.partition" without the need to do continuous
23222383
polling.
23232384

2385+
A user can pre-configure certain CPUs to an isolated state
2386+
with load balancing disabled at boot time with the "isolcpus"
2387+
kernel boot command line option. If those CPUs are to be put
2388+
into a partition, they have to be used in an isolated partition.
2389+
23242390

23252391
Device controller
23262392
-----------------

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,10 @@
580580
named mounts. Specifying both "all" and "named" disables
581581
all v1 hierarchies.
582582

583+
cgroup_favordynmods= [KNL] Enable or Disable favordynmods.
584+
Format: { "true" | "false" }
585+
Defaults to the value of CONFIG_CGROUP_FAVOR_DYNMODS.
586+
583587
cgroup.memory= [KNL] Pass options to the cgroup memory controller.
584588
Format: <string>
585589
nosocket -- Disable socket memory accounting.

kernel/cgroup/cgroup.c

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,8 @@ static u16 have_exit_callback __read_mostly;
207207
static u16 have_release_callback __read_mostly;
208208
static u16 have_canfork_callback __read_mostly;
209209

210+
static bool have_favordynmods __ro_after_init = IS_ENABLED(CONFIG_CGROUP_FAVOR_DYNMODS);
211+
210212
/* cgroup namespace for init task */
211213
struct cgroup_namespace init_cgroup_ns = {
212214
.ns.count = REFCOUNT_INIT(2),
@@ -1350,7 +1352,9 @@ static void cgroup_destroy_root(struct cgroup_root *root)
13501352
cgroup_root_count--;
13511353
}
13521354

1353-
cgroup_favor_dynmods(root, false);
1355+
if (!have_favordynmods)
1356+
cgroup_favor_dynmods(root, false);
1357+
13541358
cgroup_exit_root_id(root);
13551359

13561360
cgroup_unlock();
@@ -1719,20 +1723,22 @@ static int css_populate_dir(struct cgroup_subsys_state *css)
17191723

17201724
if (!css->ss) {
17211725
if (cgroup_on_dfl(cgrp)) {
1722-
ret = cgroup_addrm_files(&cgrp->self, cgrp,
1726+
ret = cgroup_addrm_files(css, cgrp,
17231727
cgroup_base_files, true);
17241728
if (ret < 0)
17251729
return ret;
17261730

17271731
if (cgroup_psi_enabled()) {
1728-
ret = cgroup_addrm_files(&cgrp->self, cgrp,
1732+
ret = cgroup_addrm_files(css, cgrp,
17291733
cgroup_psi_files, true);
17301734
if (ret < 0)
17311735
return ret;
17321736
}
17331737
} else {
1734-
cgroup_addrm_files(css, cgrp,
1735-
cgroup1_base_files, true);
1738+
ret = cgroup_addrm_files(css, cgrp,
1739+
cgroup1_base_files, true);
1740+
if (ret < 0)
1741+
return ret;
17361742
}
17371743
} else {
17381744
list_for_each_entry(cfts, &css->ss->cfts, node) {
@@ -2243,9 +2249,9 @@ static int cgroup_init_fs_context(struct fs_context *fc)
22432249
fc->user_ns = get_user_ns(ctx->ns->user_ns);
22442250
fc->global = true;
22452251

2246-
#ifdef CONFIG_CGROUP_FAVOR_DYNMODS
2247-
ctx->flags |= CGRP_ROOT_FAVOR_DYNMODS;
2248-
#endif
2252+
if (have_favordynmods)
2253+
ctx->flags |= CGRP_ROOT_FAVOR_DYNMODS;
2254+
22492255
return 0;
22502256
}
22512257

@@ -6121,7 +6127,7 @@ int __init cgroup_init(void)
61216127

61226128
if (cgroup1_ssid_disabled(ssid))
61236129
pr_info("Disabling %s control group subsystem in v1 mounts\n",
6124-
ss->name);
6130+
ss->legacy_name);
61256131

61266132
cgrp_dfl_root.subsys_mask |= 1 << ss->id;
61276133

@@ -6764,6 +6770,12 @@ static int __init enable_cgroup_debug(char *str)
67646770
}
67656771
__setup("cgroup_debug", enable_cgroup_debug);
67666772

6773+
static int __init cgroup_favordynmods_setup(char *str)
6774+
{
6775+
return (kstrtobool(str, &have_favordynmods) == 0);
6776+
}
6777+
__setup("cgroup_favordynmods=", cgroup_favordynmods_setup);
6778+
67676779
/**
67686780
* css_tryget_online_from_dir - get corresponding css from a cgroup dentry
67696781
* @dentry: directory dentry of interest

0 commit comments

Comments
 (0)