Skip to content

Commit 895b9b1

Browse files
committed
Merge tag 'cgroup-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo: - Added Michal Koutný as a maintainer - Counters in pids.events were behaving inconsistently. pids.events made properly hierarchical and pids.events.local added - misc.peak and misc.events.local added - cpuset remote partition creation and cpuset.cpus.exclusive handling improved - Code cleanups, non-critical fixes, doc updates - for-6.10-fixes is merged in to receive two non-critical fixes that didn't trigger pull * tag 'cgroup-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits) cgroup: Add Michal Koutný as a maintainer cgroup/misc: Introduce misc.events.local cgroup/rstat: add force idle show helper cgroup: Protect css->cgroup write under css_set_lock cgroup/misc: Introduce misc.peak cgroup_misc: add kernel-doc comments for enum misc_res_type cgroup/cpuset: Prevent UAF in proc_cpuset_show() selftest/cgroup: Update test_cpuset_prs.sh to match changes cgroup/cpuset: Make cpuset.cpus.exclusive independent of cpuset.cpus cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition selftest/cgroup: Fix test_cpuset_prs.sh problems reported by test robot cgroup/cpuset: Fix remote root partition creation problem cgroup: avoid the unnecessary list_add(dying_tasks) in cgroup_exit() cgroup/cpuset: Optimize isolated partition only generate_sched_domains() calls cgroup/cpuset: Reduce the lock protecting CS_SCHED_LOAD_BALANCE kernel/cgroup: cleanup cgroup_base_files when fail to add cgroup_psi_files selftests: cgroup: Add basic tests for pids controller selftests: cgroup: Lexicographic order in Makefile cgroup/pids: Add pids.events.local cgroup/pids: Make event counters hierarchical ...
2 parents f97b956 + 9283ff5 commit 895b9b1

File tree

14 files changed

+679
-159
lines changed

14 files changed

+679
-159
lines changed

Documentation/admin-guide/cgroup-v1/pids.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,8 @@ superset of parent/child/pids.current.
3636

3737
The pids.events file contains event counters:
3838

39-
- max: Number of times fork failed because limit was hit.
39+
- max: Number of times fork failed in the cgroup because limit was hit in
40+
self or ancestors.
4041

4142
Example
4243
-------

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 39 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,13 @@ cgroup v2 currently supports the following mount options.
239239
will not be tracked by the memory controller (even if cgroup
240240
v2 is remounted later on).
241241

242+
pids_localevents
243+
The option restores v1-like behavior of pids.events:max, that is only
244+
local (inside cgroup proper) fork failures are counted. Without this
245+
option pids.events.max represents any pids.max enforcemnt across
246+
cgroup's subtree.
247+
248+
242249

243250
Organizing Processes and Threads
244251
--------------------------------
@@ -2205,12 +2212,18 @@ PID Interface Files
22052212
descendants has ever reached.
22062213

22072214
pids.events
2208-
A read-only flat-keyed file which exists on non-root cgroups. The
2209-
following entries are defined. Unless specified otherwise, a value
2210-
change in this file generates a file modified event.
2215+
A read-only flat-keyed file which exists on non-root cgroups. Unless
2216+
specified otherwise, a value change in this file generates a file
2217+
modified event. The following entries are defined.
22112218

22122219
max
2213-
Number of times fork failed because limit was hit.
2220+
The number of times the cgroup's total number of processes hit the pids.max
2221+
limit (see also pids_localevents).
2222+
2223+
pids.events.local
2224+
Similar to pids.events but the fields in the file are local
2225+
to the cgroup i.e. not hierarchical. The file modified event
2226+
generated on this file reflects only the local events.
22142227

22152228
Organisational operations are not blocked by cgroup policies, so it is
22162229
possible to have pids.current > pids.max. This can be done by either
@@ -2346,8 +2359,12 @@ Cpuset Interface Files
23462359
is always a subset of it.
23472360

23482361
Users can manually set it to a value that is different from
2349-
"cpuset.cpus". The only constraint in setting it is that the
2350-
list of CPUs must be exclusive with respect to its sibling.
2362+
"cpuset.cpus". One constraint in setting it is that the list of
2363+
CPUs must be exclusive with respect to "cpuset.cpus.exclusive"
2364+
of its sibling. If "cpuset.cpus.exclusive" of a sibling cgroup
2365+
isn't set, its "cpuset.cpus" value, if set, cannot be a subset
2366+
of it to leave at least one CPU available when the exclusive
2367+
CPUs are taken away.
23512368

23522369
For a parent cgroup, any one of its exclusive CPUs can only
23532370
be distributed to at most one of its child cgroups. Having an
@@ -2363,8 +2380,8 @@ Cpuset Interface Files
23632380
cpuset-enabled cgroups.
23642381

23652382
This file shows the effective set of exclusive CPUs that
2366-
can be used to create a partition root. The content of this
2367-
file will always be a subset of "cpuset.cpus" and its parent's
2383+
can be used to create a partition root. The content
2384+
of this file will always be a subset of its parent's
23682385
"cpuset.cpus.exclusive.effective" if its parent is not the root
23692386
cgroup. It will also be a subset of "cpuset.cpus.exclusive"
23702387
if it is set. If "cpuset.cpus.exclusive" is not set, it is
@@ -2625,6 +2642,15 @@ Miscellaneous controller provides 3 interface files. If two misc resources (res_
26252642
res_a 3
26262643
res_b 0
26272644

2645+
misc.peak
2646+
A read-only flat-keyed file shown in all cgroups. It shows the
2647+
historical maximum usage of the resources in the cgroup and its
2648+
children.::
2649+
2650+
$ cat misc.peak
2651+
res_a 10
2652+
res_b 8
2653+
26282654
misc.max
26292655
A read-write flat-keyed file shown in the non root cgroups. Allowed
26302656
maximum usage of the resources in the cgroup and its children.::
@@ -2654,6 +2680,11 @@ Miscellaneous controller provides 3 interface files. If two misc resources (res_
26542680
The number of times the cgroup's resource usage was
26552681
about to go over the max boundary.
26562682

2683+
misc.events.local
2684+
Similar to misc.events but the fields in the file are local to the
2685+
cgroup i.e. not hierarchical. The file modified event generated on
2686+
this file reflects only the local events.
2687+
26572688
Migration and Ownership
26582689
~~~~~~~~~~~~~~~~~~~~~~~
26592690

MAINTAINERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5528,6 +5528,7 @@ CONTROL GROUP (CGROUP)
55285528
M: Tejun Heo <[email protected]>
55295529
M: Zefan Li <[email protected]>
55305530
M: Johannes Weiner <[email protected]>
5531+
M: Michal Koutný <[email protected]>
55315532
55325533
S: Maintained
55335534
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git

include/linux/cgroup-defs.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,12 @@ enum {
119119
/*
120120
* Enable hugetlb accounting for the memory controller.
121121
*/
122-
CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING = (1 << 19),
122+
CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING = (1 << 19),
123+
124+
/*
125+
* Enable legacy local pids.events.
126+
*/
127+
CGRP_ROOT_PIDS_LOCAL_EVENTS = (1 << 20),
123128
};
124129

125130
/* cftype->flags */

include/linux/misc_cgroup.h

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,16 @@
99
#define _MISC_CGROUP_H_
1010

1111
/**
12-
* Types of misc cgroup entries supported by the host.
12+
* enum misc_res_type - Types of misc cgroup entries supported by the host.
1313
*/
1414
enum misc_res_type {
1515
#ifdef CONFIG_KVM_AMD_SEV
16-
/* AMD SEV ASIDs resource */
16+
/** @MISC_CG_RES_SEV: AMD SEV ASIDs resource */
1717
MISC_CG_RES_SEV,
18-
/* AMD SEV-ES ASIDs resource */
18+
/** @MISC_CG_RES_SEV_ES: AMD SEV-ES ASIDs resource */
1919
MISC_CG_RES_SEV_ES,
2020
#endif
21+
/** @MISC_CG_RES_TYPES: count of enum misc_res_type constants */
2122
MISC_CG_RES_TYPES
2223
};
2324

@@ -30,13 +31,16 @@ struct misc_cg;
3031
/**
3132
* struct misc_res: Per cgroup per misc type resource
3233
* @max: Maximum limit on the resource.
34+
* @watermark: Historical maximum usage of the resource.
3335
* @usage: Current usage of the resource.
3436
* @events: Number of times, the resource limit exceeded.
3537
*/
3638
struct misc_res {
3739
u64 max;
40+
atomic64_t watermark;
3841
atomic64_t usage;
3942
atomic64_t events;
43+
atomic64_t events_local;
4044
};
4145

4246
/**
@@ -50,6 +54,8 @@ struct misc_cg {
5054

5155
/* misc.events */
5256
struct cgroup_file events_file;
57+
/* misc.events.local */
58+
struct cgroup_file events_local_file;
5359

5460
struct misc_res res[MISC_CG_RES_TYPES];
5561
};

kernel/cgroup/cgroup.c

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1744,8 +1744,11 @@ static int css_populate_dir(struct cgroup_subsys_state *css)
17441744
if (cgroup_psi_enabled()) {
17451745
ret = cgroup_addrm_files(css, cgrp,
17461746
cgroup_psi_files, true);
1747-
if (ret < 0)
1747+
if (ret < 0) {
1748+
cgroup_addrm_files(css, cgrp,
1749+
cgroup_base_files, false);
17481750
return ret;
1751+
}
17491752
}
17501753
} else {
17511754
ret = cgroup_addrm_files(css, cgrp,
@@ -1839,9 +1842,9 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
18391842
RCU_INIT_POINTER(scgrp->subsys[ssid], NULL);
18401843
rcu_assign_pointer(dcgrp->subsys[ssid], css);
18411844
ss->root = dst_root;
1842-
css->cgroup = dcgrp;
18431845

18441846
spin_lock_irq(&css_set_lock);
1847+
css->cgroup = dcgrp;
18451848
WARN_ON(!list_empty(&dcgrp->e_csets[ss->id]));
18461849
list_for_each_entry_safe(cset, cset_pos, &scgrp->e_csets[ss->id],
18471850
e_cset_node[ss->id]) {
@@ -1922,6 +1925,7 @@ enum cgroup2_param {
19221925
Opt_memory_localevents,
19231926
Opt_memory_recursiveprot,
19241927
Opt_memory_hugetlb_accounting,
1928+
Opt_pids_localevents,
19251929
nr__cgroup2_params
19261930
};
19271931

@@ -1931,6 +1935,7 @@ static const struct fs_parameter_spec cgroup2_fs_parameters[] = {
19311935
fsparam_flag("memory_localevents", Opt_memory_localevents),
19321936
fsparam_flag("memory_recursiveprot", Opt_memory_recursiveprot),
19331937
fsparam_flag("memory_hugetlb_accounting", Opt_memory_hugetlb_accounting),
1938+
fsparam_flag("pids_localevents", Opt_pids_localevents),
19341939
{}
19351940
};
19361941

@@ -1960,6 +1965,9 @@ static int cgroup2_parse_param(struct fs_context *fc, struct fs_parameter *param
19601965
case Opt_memory_hugetlb_accounting:
19611966
ctx->flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
19621967
return 0;
1968+
case Opt_pids_localevents:
1969+
ctx->flags |= CGRP_ROOT_PIDS_LOCAL_EVENTS;
1970+
return 0;
19631971
}
19641972
return -EINVAL;
19651973
}
@@ -1989,6 +1997,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags)
19891997
cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
19901998
else
19911999
cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
2000+
2001+
if (root_flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
2002+
cgrp_dfl_root.flags |= CGRP_ROOT_PIDS_LOCAL_EVENTS;
2003+
else
2004+
cgrp_dfl_root.flags &= ~CGRP_ROOT_PIDS_LOCAL_EVENTS;
19922005
}
19932006
}
19942007

@@ -2004,6 +2017,8 @@ static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root
20042017
seq_puts(seq, ",memory_recursiveprot");
20052018
if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING)
20062019
seq_puts(seq, ",memory_hugetlb_accounting");
2020+
if (cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
2021+
seq_puts(seq, ",pids_localevents");
20072022
return 0;
20082023
}
20092024

@@ -6686,8 +6701,10 @@ void cgroup_exit(struct task_struct *tsk)
66866701
WARN_ON_ONCE(list_empty(&tsk->cg_list));
66876702
cset = task_css_set(tsk);
66886703
css_set_move_task(tsk, cset, NULL, false);
6689-
list_add_tail(&tsk->cg_list, &cset->dying_tasks);
66906704
cset->nr_tasks--;
6705+
/* matches the signal->live check in css_task_iter_advance() */
6706+
if (thread_group_leader(tsk) && atomic_read(&tsk->signal->live))
6707+
list_add_tail(&tsk->cg_list, &cset->dying_tasks);
66916708

66926709
if (dl_task(tsk))
66936710
dec_dl_tasks_cs(tsk);
@@ -6714,10 +6731,12 @@ void cgroup_release(struct task_struct *task)
67146731
ss->release(task);
67156732
} while_each_subsys_mask();
67166733

6717-
spin_lock_irq(&css_set_lock);
6718-
css_set_skip_task_iters(task_css_set(task), task);
6719-
list_del_init(&task->cg_list);
6720-
spin_unlock_irq(&css_set_lock);
6734+
if (!list_empty(&task->cg_list)) {
6735+
spin_lock_irq(&css_set_lock);
6736+
css_set_skip_task_iters(task_css_set(task), task);
6737+
list_del_init(&task->cg_list);
6738+
spin_unlock_irq(&css_set_lock);
6739+
}
67216740
}
67226741

67236742
void cgroup_free(struct task_struct *task)
@@ -7062,7 +7081,8 @@ static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr,
70627081
"favordynmods\n"
70637082
"memory_localevents\n"
70647083
"memory_recursiveprot\n"
7065-
"memory_hugetlb_accounting\n");
7084+
"memory_hugetlb_accounting\n"
7085+
"pids_localevents\n");
70667086
}
70677087
static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features);
70687088

0 commit comments

Comments
 (0)