Skip to content

Commit 7716f38

Browse files
committed
Merge tag 'cgroup-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo: - Per-cpu cpu usage stats are now tracked This currently isn't printed out in the cgroupfs interface and can only be accessed through e.g. BPF. Should decide on a not-too-ugly way to show per-cpu stats in cgroupfs - cpuset received some cleanups and prepatory patches for the pending cpus.exclusive patchset which will allow cpuset partitions to be created below non-partition parents, which should ease the management of partition cpusets - A lot of code and documentation cleanup patches - tools/testing/selftests/cgroup/test_cpuset.c added * tag 'cgroup-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (32 commits) cgroup: Avoid -Wstringop-overflow warnings cgroup:namespace: Remove unused cgroup_namespaces_init() cgroup/rstat: Record the cumulative per-cpu time of cgroup and its descendants cgroup: clean up if condition in cgroup_pidlist_start() cgroup: fix obsolete function name in cgroup_destroy_locked() Documentation: cgroup-v2.rst: Correct number of stats entries cgroup: fix obsolete function name above css_free_rwork_fn() cgroup/cpuset: fix kernel-doc cgroup: clean up printk() cgroup: fix obsolete comment above cgroup_create() docs: cgroup-v1: fix typo docs: cgroup-v1: correct the term of Page Cache organization in inode cgroup/misc: Store atomic64_t reads to u64 cgroup/misc: Change counters to be explicit 64bit types cgroup/misc: update struct members descriptions cgroup: remove cgrp->kn check in css_populate_dir() cgroup: fix obsolete function name cgroup: use cached local variable parent in for loop cgroup: remove obsolete comment above struct cgroupstats cgroup: put cgroup_tryget_css() inside CONFIG_CGROUP_SCHED ...
2 parents e987af4 + 78d44b8 commit 7716f38

File tree

19 files changed

+560
-204
lines changed

19 files changed

+560
-204
lines changed

Documentation/admin-guide/cgroup-v1/memory.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -195,11 +195,11 @@ are not accounted. We just account pages under usual VM management.
195195

196196
RSS pages are accounted at page_fault unless they've already been accounted
197197
for earlier. A file page will be accounted for as Page Cache when it's
198-
inserted into inode (radix-tree). While it's mapped into the page tables of
198+
inserted into inode (xarray). While it's mapped into the page tables of
199199
processes, duplicate accounting is carefully avoided.
200200

201201
An RSS page is unaccounted when it's fully unmapped. A PageCache page is
202-
unaccounted when it's removed from radix-tree. Even if RSS pages are fully
202+
unaccounted when it's removed from xarray. Even if RSS pages are fully
203203
unmapped (by kswapd), they may exist as SwapCache in the system until they
204204
are really freed. Such SwapCaches are also accounted.
205205
A swapped-in page is accounted after adding into swapcache.
@@ -907,7 +907,7 @@ experiences some pressure. In this situation, only group C will receive the
907907
notification, i.e. groups A and B will not receive it. This is done to avoid
908908
excessive "broadcasting" of messages, which disturbs the system and which is
909909
especially bad if we are low on memory or thrashing. Group B, will receive
910-
notification only if there are no event listers for group C.
910+
notification only if there are no event listeners for group C.
911911

912912
There are three optional modes that specify different propagation behavior:
913913

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1045,7 +1045,7 @@ All time durations are in microseconds.
10451045
- user_usec
10461046
- system_usec
10471047

1048-
and the following three when the controller is enabled:
1048+
and the following five when the controller is enabled:
10491049

10501050
- nr_periods
10511051
- nr_throttled

MAINTAINERS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5255,6 +5255,8 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
52555255
F: Documentation/admin-guide/cgroup-v1/cpusets.rst
52565256
F: include/linux/cpuset.h
52575257
F: kernel/cgroup/cpuset.c
5258+
F: tools/testing/selftests/cgroup/test_cpuset.c
5259+
F: tools/testing/selftests/cgroup/test_cpuset_prs.sh
52585260

52595261
CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
52605262
M: Johannes Weiner <[email protected]>

include/linux/cgroup-defs.h

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,20 @@ struct cgroup_rstat_cpu {
341341
*/
342342
struct cgroup_base_stat last_bstat;
343343

344+
/*
345+
* This field is used to record the cumulative per-cpu time of
346+
* the cgroup and its descendants. Currently it can be read via
347+
* eBPF/drgn etc, and we are still trying to determine how to
348+
* expose it in the cgroupfs interface.
349+
*/
350+
struct cgroup_base_stat subtree_bstat;
351+
352+
/*
353+
* Snapshots at the last reading. These are used to calculate the
354+
* deltas to propagate to the per-cpu subtree_bstat.
355+
*/
356+
struct cgroup_base_stat last_subtree_bstat;
357+
344358
/*
345359
* Child cgroups with stat updates on this cpu since the last read
346360
* are linked on the parent's ->updated_children through

include/linux/misc_cgroup.h

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -31,17 +31,18 @@ struct misc_cg;
3131
* struct misc_res: Per cgroup per misc type resource
3232
* @max: Maximum limit on the resource.
3333
* @usage: Current usage of the resource.
34-
* @failed: True if charged failed for the resource in a cgroup.
34+
* @events: Number of times, the resource limit exceeded.
3535
*/
3636
struct misc_res {
37-
unsigned long max;
38-
atomic_long_t usage;
39-
atomic_long_t events;
37+
u64 max;
38+
atomic64_t usage;
39+
atomic64_t events;
4040
};
4141

4242
/**
4343
* struct misc_cg - Miscellaneous controller's cgroup structure.
4444
* @css: cgroup subsys state object.
45+
* @events_file: Handle for the misc resources events file.
4546
* @res: Array of misc resources usage in the cgroup.
4647
*/
4748
struct misc_cg {
@@ -53,12 +54,10 @@ struct misc_cg {
5354
struct misc_res res[MISC_CG_RES_TYPES];
5455
};
5556

56-
unsigned long misc_cg_res_total_usage(enum misc_res_type type);
57-
int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity);
58-
int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
59-
unsigned long amount);
60-
void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg,
61-
unsigned long amount);
57+
u64 misc_cg_res_total_usage(enum misc_res_type type);
58+
int misc_cg_set_capacity(enum misc_res_type type, u64 capacity);
59+
int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount);
60+
void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, u64 amount);
6261

6362
/**
6463
* css_misc() - Get misc cgroup from the css.
@@ -99,27 +98,26 @@ static inline void put_misc_cg(struct misc_cg *cg)
9998

10099
#else /* !CONFIG_CGROUP_MISC */
101100

102-
static inline unsigned long misc_cg_res_total_usage(enum misc_res_type type)
101+
static inline u64 misc_cg_res_total_usage(enum misc_res_type type)
103102
{
104103
return 0;
105104
}
106105

107-
static inline int misc_cg_set_capacity(enum misc_res_type type,
108-
unsigned long capacity)
106+
static inline int misc_cg_set_capacity(enum misc_res_type type, u64 capacity)
109107
{
110108
return 0;
111109
}
112110

113111
static inline int misc_cg_try_charge(enum misc_res_type type,
114112
struct misc_cg *cg,
115-
unsigned long amount)
113+
u64 amount)
116114
{
117115
return 0;
118116
}
119117

120118
static inline void misc_cg_uncharge(enum misc_res_type type,
121119
struct misc_cg *cg,
122-
unsigned long amount)
120+
u64 amount)
123121
{
124122
}
125123

include/uapi/linux/cgroupstats.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,6 @@
2424
* basis. This data is shared using taskstats.
2525
*
2626
* Most of these states are derived by looking at the task->state value
27-
* For the nr_io_wait state, a flag in the delay accounting structure
28-
* indicates that the task is waiting on IO
2927
*
3028
* Each member is aligned to a 8 byte boundary.
3129
*/

kernel/cgroup/cgroup-v1.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -431,7 +431,7 @@ static void *cgroup_pidlist_start(struct seq_file *s, loff_t *pos)
431431
if (l->list[mid] == pid) {
432432
index = mid;
433433
break;
434-
} else if (l->list[mid] <= pid)
434+
} else if (l->list[mid] < pid)
435435
index = mid + 1;
436436
else
437437
end = mid;

kernel/cgroup/cgroup.c

Lines changed: 41 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -492,28 +492,6 @@ static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp,
492492
return &cgrp->self;
493493
}
494494

495-
/**
496-
* cgroup_tryget_css - try to get a cgroup's css for the specified subsystem
497-
* @cgrp: the cgroup of interest
498-
* @ss: the subsystem of interest
499-
*
500-
* Find and get @cgrp's css associated with @ss. If the css doesn't exist
501-
* or is offline, %NULL is returned.
502-
*/
503-
static struct cgroup_subsys_state *cgroup_tryget_css(struct cgroup *cgrp,
504-
struct cgroup_subsys *ss)
505-
{
506-
struct cgroup_subsys_state *css;
507-
508-
rcu_read_lock();
509-
css = cgroup_css(cgrp, ss);
510-
if (css && !css_tryget_online(css))
511-
css = NULL;
512-
rcu_read_unlock();
513-
514-
return css;
515-
}
516-
517495
/**
518496
* cgroup_e_css_by_mask - obtain a cgroup's effective css for the specified ss
519497
* @cgrp: the cgroup of interest
@@ -679,7 +657,7 @@ EXPORT_SYMBOL_GPL(of_css);
679657
* @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end
680658
* @cgrp: the target cgroup to iterate css's of
681659
*
682-
* Should be called under cgroup_[tree_]mutex.
660+
* Should be called under cgroup_mutex.
683661
*/
684662
#define for_each_css(css, ssid, cgrp) \
685663
for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++) \
@@ -929,7 +907,7 @@ static void css_set_move_task(struct task_struct *task,
929907
#define CSS_SET_HASH_BITS 7
930908
static DEFINE_HASHTABLE(css_set_table, CSS_SET_HASH_BITS);
931909

932-
static unsigned long css_set_hash(struct cgroup_subsys_state *css[])
910+
static unsigned long css_set_hash(struct cgroup_subsys_state **css)
933911
{
934912
unsigned long key = 0UL;
935913
struct cgroup_subsys *ss;
@@ -1070,7 +1048,7 @@ static bool compare_css_sets(struct css_set *cset,
10701048
*/
10711049
static struct css_set *find_existing_css_set(struct css_set *old_cset,
10721050
struct cgroup *cgrp,
1073-
struct cgroup_subsys_state *template[])
1051+
struct cgroup_subsys_state **template)
10741052
{
10751053
struct cgroup_root *root = cgrp->root;
10761054
struct cgroup_subsys *ss;
@@ -1736,7 +1714,7 @@ static int css_populate_dir(struct cgroup_subsys_state *css)
17361714
struct cftype *cfts, *failed_cfts;
17371715
int ret;
17381716

1739-
if ((css->flags & CSS_VISIBLE) || !cgrp->kn)
1717+
if (css->flags & CSS_VISIBLE)
17401718
return 0;
17411719

17421720
if (!css->ss) {
@@ -2499,7 +2477,7 @@ struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset,
24992477

25002478
/*
25012479
* This function may be called both before and
2502-
* after cgroup_taskset_migrate(). The two cases
2480+
* after cgroup_migrate_execute(). The two cases
25032481
* can be distinguished by looking at whether @cset
25042482
* has its ->mg_dst_cset set.
25052483
*/
@@ -3654,9 +3632,32 @@ static int cgroup_stat_show(struct seq_file *seq, void *v)
36543632
return 0;
36553633
}
36563634

3657-
static int __maybe_unused cgroup_extra_stat_show(struct seq_file *seq,
3658-
struct cgroup *cgrp, int ssid)
3635+
#ifdef CONFIG_CGROUP_SCHED
3636+
/**
3637+
* cgroup_tryget_css - try to get a cgroup's css for the specified subsystem
3638+
* @cgrp: the cgroup of interest
3639+
* @ss: the subsystem of interest
3640+
*
3641+
* Find and get @cgrp's css associated with @ss. If the css doesn't exist
3642+
* or is offline, %NULL is returned.
3643+
*/
3644+
static struct cgroup_subsys_state *cgroup_tryget_css(struct cgroup *cgrp,
3645+
struct cgroup_subsys *ss)
36593646
{
3647+
struct cgroup_subsys_state *css;
3648+
3649+
rcu_read_lock();
3650+
css = cgroup_css(cgrp, ss);
3651+
if (css && !css_tryget_online(css))
3652+
css = NULL;
3653+
rcu_read_unlock();
3654+
3655+
return css;
3656+
}
3657+
3658+
static int cgroup_extra_stat_show(struct seq_file *seq, int ssid)
3659+
{
3660+
struct cgroup *cgrp = seq_css(seq)->cgroup;
36603661
struct cgroup_subsys *ss = cgroup_subsys[ssid];
36613662
struct cgroup_subsys_state *css;
36623663
int ret;
@@ -3672,15 +3673,15 @@ static int __maybe_unused cgroup_extra_stat_show(struct seq_file *seq,
36723673
css_put(css);
36733674
return ret;
36743675
}
3676+
#endif
36753677

36763678
static int cpu_stat_show(struct seq_file *seq, void *v)
36773679
{
3678-
struct cgroup __maybe_unused *cgrp = seq_css(seq)->cgroup;
36793680
int ret = 0;
36803681

36813682
cgroup_base_stat_cputime_show(seq);
36823683
#ifdef CONFIG_CGROUP_SCHED
3683-
ret = cgroup_extra_stat_show(seq, cgrp, cpu_cgrp_id);
3684+
ret = cgroup_extra_stat_show(seq, cpu_cgrp_id);
36843685
#endif
36853686
return ret;
36863687
}
@@ -4350,14 +4351,13 @@ static int cgroup_init_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
43504351
return ret;
43514352
}
43524353

4353-
static int cgroup_rm_cftypes_locked(struct cftype *cfts)
4354+
static void cgroup_rm_cftypes_locked(struct cftype *cfts)
43544355
{
43554356
lockdep_assert_held(&cgroup_mutex);
43564357

43574358
list_del(&cfts->node);
43584359
cgroup_apply_cftypes(cfts, false);
43594360
cgroup_exit_cftypes(cfts);
4360-
return 0;
43614361
}
43624362

43634363
/**
@@ -4373,18 +4373,16 @@ static int cgroup_rm_cftypes_locked(struct cftype *cfts)
43734373
*/
43744374
int cgroup_rm_cftypes(struct cftype *cfts)
43754375
{
4376-
int ret;
4377-
43784376
if (!cfts || cfts[0].name[0] == '\0')
43794377
return 0;
43804378

43814379
if (!(cfts[0].flags & __CFTYPE_ADDED))
43824380
return -ENOENT;
43834381

43844382
cgroup_lock();
4385-
ret = cgroup_rm_cftypes_locked(cfts);
4383+
cgroup_rm_cftypes_locked(cfts);
43864384
cgroup_unlock();
4387-
return ret;
4385+
return 0;
43884386
}
43894387

43904388
/**
@@ -5337,7 +5335,7 @@ static struct cftype cgroup_psi_files[] = {
53375335
* RCU callback.
53385336
*
53395337
* 4. After the grace period, the css can be freed. Implemented in
5340-
* css_free_work_fn().
5338+
* css_free_rwork_fn().
53415339
*
53425340
* It is actually hairier because both step 2 and 4 require process context
53435341
* and thus involve punting to css->destroy_work adding two additional
@@ -5581,8 +5579,7 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
55815579

55825580
/*
55835581
* The returned cgroup is fully initialized including its control mask, but
5584-
* it isn't associated with its kernfs_node and doesn't have the control
5585-
* mask applied.
5582+
* it doesn't have the control mask applied.
55865583
*/
55875584
static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
55885585
umode_t mode)
@@ -5908,7 +5905,7 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
59085905
/*
59095906
* Mark @cgrp and the associated csets dead. The former prevents
59105907
* further task migration and child creation by disabling
5911-
* cgroup_lock_live_group(). The latter makes the csets ignored by
5908+
* cgroup_kn_lock_live(). The latter makes the csets ignored by
59125909
* the migration path.
59135910
*/
59145911
cgrp->self.flags &= ~CSS_ONLINE;
@@ -5930,7 +5927,7 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
59305927
parent->nr_threaded_children--;
59315928

59325929
spin_lock_irq(&css_set_lock);
5933-
for (tcgrp = cgroup_parent(cgrp); tcgrp; tcgrp = cgroup_parent(tcgrp)) {
5930+
for (tcgrp = parent; tcgrp; tcgrp = cgroup_parent(tcgrp)) {
59345931
tcgrp->nr_descendants--;
59355932
tcgrp->nr_dying_descendants++;
59365933
/*
@@ -6123,8 +6120,8 @@ int __init cgroup_init(void)
61236120
continue;
61246121

61256122
if (cgroup1_ssid_disabled(ssid))
6126-
printk(KERN_INFO "Disabling %s control group subsystem in v1 mounts\n",
6127-
ss->name);
6123+
pr_info("Disabling %s control group subsystem in v1 mounts\n",
6124+
ss->name);
61286125

61296126
cgrp_dfl_root.subsys_mask |= 1 << ss->id;
61306127

0 commit comments

Comments
 (0)