Skip to content

Commit 84db0e0

Browse files
committed
cgroup/cpuset: Fix race between newly created partition and dying one
jira NONE_AUTOMATION Rebuild_History Non-Buildable kernel-5.14.0-570.17.1.el9_6 commit-author Waiman Long <[email protected]> commit a22b3d5 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-5.14.0-570.17.1.el9_6/a22b3d54.failed There is a possible race between removing a cgroup diectory that is a partition root and the creation of a new partition. The partition to be removed can be dying but still online, it doesn't not currently participate in checking for exclusive CPUs conflict, but the exclusive CPUs are still there in subpartitions_cpus and isolated_cpus. These two cpumasks are global states that affect the operation of cpuset partitions. The exclusive CPUs in dying cpusets will only be removed when cpuset_css_offline() function is called after an RCU delay. As a result, it is possible that a new partition can be created with exclusive CPUs that overlap with those of a dying one. When that dying partition is finally offlined, it removes those overlapping exclusive CPUs from subpartitions_cpus and maybe isolated_cpus resulting in an incorrect CPU configuration. This bug was found when a warning was triggered in remote_partition_disable() during testing because the subpartitions_cpus mask was empty. One possible way to fix this is to iterate the dying cpusets as well and avoid using the exclusive CPUs in those dying cpusets. However, this can still cause random partition creation failures or other anomalies due to racing. A better way to fix this race is to reset the partition state at the moment when a cpuset is being killed. Introduce a new css_killed() CSS function pointer and call it, if defined, before setting CSS_DYING flag in kill_css(). Also update the css_is_dying() helper to use the CSS_DYING flag introduced by commit 33c35aa ("cgroup: Prevent kill_css() from being called more than once") for proper synchronization. Add a new cpuset_css_killed() function to reset the partition state of a valid partition root if it is being killed. Fixes: ee8dde0 ("cpuset: Add new v2 cpuset.sched.partition flag") Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Tejun Heo <[email protected]> (cherry picked from commit a22b3d5) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # kernel/cgroup/cpuset.c
1 parent 5a26e0c commit 84db0e0

File tree

1 file changed

+117
-0
lines changed

1 file changed

+117
-0
lines changed
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
cgroup/cpuset: Fix race between newly created partition and dying one
2+
3+
jira NONE_AUTOMATION
4+
Rebuild_History Non-Buildable kernel-5.14.0-570.17.1.el9_6
5+
commit-author Waiman Long <[email protected]>
6+
commit a22b3d54de94f82ca057cc2ebf9496fa91ebf698
7+
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
8+
Will be included in final tarball splat. Ref for failed cherry-pick at:
9+
ciq/ciq_backports/kernel-5.14.0-570.17.1.el9_6/a22b3d54.failed
10+
11+
There is a possible race between removing a cgroup diectory that is
12+
a partition root and the creation of a new partition. The partition
13+
to be removed can be dying but still online, it doesn't not currently
14+
participate in checking for exclusive CPUs conflict, but the exclusive
15+
CPUs are still there in subpartitions_cpus and isolated_cpus. These
16+
two cpumasks are global states that affect the operation of cpuset
17+
partitions. The exclusive CPUs in dying cpusets will only be removed
18+
when cpuset_css_offline() function is called after an RCU delay.
19+
20+
As a result, it is possible that a new partition can be created with
21+
exclusive CPUs that overlap with those of a dying one. When that dying
22+
partition is finally offlined, it removes those overlapping exclusive
23+
CPUs from subpartitions_cpus and maybe isolated_cpus resulting in an
24+
incorrect CPU configuration.
25+
26+
This bug was found when a warning was triggered in
27+
remote_partition_disable() during testing because the subpartitions_cpus
28+
mask was empty.
29+
30+
One possible way to fix this is to iterate the dying cpusets as well and
31+
avoid using the exclusive CPUs in those dying cpusets. However, this
32+
can still cause random partition creation failures or other anomalies
33+
due to racing. A better way to fix this race is to reset the partition
34+
state at the moment when a cpuset is being killed.
35+
36+
Introduce a new css_killed() CSS function pointer and call it, if
37+
defined, before setting CSS_DYING flag in kill_css(). Also update the
38+
css_is_dying() helper to use the CSS_DYING flag introduced by commit
39+
33c35aa48178 ("cgroup: Prevent kill_css() from being called more than
40+
once") for proper synchronization.
41+
42+
Add a new cpuset_css_killed() function to reset the partition state of
43+
a valid partition root if it is being killed.
44+
45+
Fixes: ee8dde0cd2ce ("cpuset: Add new v2 cpuset.sched.partition flag")
46+
Signed-off-by: Waiman Long <[email protected]>
47+
Signed-off-by: Tejun Heo <[email protected]>
48+
(cherry picked from commit a22b3d54de94f82ca057cc2ebf9496fa91ebf698)
49+
Signed-off-by: Jonathan Maple <[email protected]>
50+
51+
# Conflicts:
52+
# kernel/cgroup/cpuset.c
53+
diff --cc kernel/cgroup/cpuset.c
54+
index 77ce168d9431,306b60430091..000000000000
55+
--- a/kernel/cgroup/cpuset.c
56+
+++ b/kernel/cgroup/cpuset.c
57+
@@@ -4172,12 -3536,8 +4172,17 @@@ static void cpuset_css_offline(struct c
58+
cpus_read_lock();
59+
mutex_lock(&cpuset_mutex);
60+
61+
++<<<<<<< HEAD
62+
+ if (is_partition_valid(cs))
63+
+ update_prstate(cs, 0);
64+
+
65+
+ if (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys) &&
66+
+ is_sched_load_balance(cs))
67+
+ update_flag(CS_SCHED_LOAD_BALANCE, cs, 0);
68+
++=======
69+
+ if (!cpuset_v2() && is_sched_load_balance(cs))
70+
+ cpuset_update_flag(CS_SCHED_LOAD_BALANCE, cs, 0);
71+
++>>>>>>> a22b3d54de94 (cgroup/cpuset: Fix race between newly created partition and dying one)
72+
73+
cpuset_dec();
74+
clear_bit(CS_ONLINE, &cs->flags);
75+
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
76+
index 6523035f4d7e..5b61f4cd7791 100644
77+
--- a/include/linux/cgroup-defs.h
78+
+++ b/include/linux/cgroup-defs.h
79+
@@ -708,6 +708,7 @@ struct cgroup_subsys {
80+
void (*css_released)(struct cgroup_subsys_state *css);
81+
void (*css_free)(struct cgroup_subsys_state *css);
82+
void (*css_reset)(struct cgroup_subsys_state *css);
83+
+ void (*css_killed)(struct cgroup_subsys_state *css);
84+
void (*css_rstat_flush)(struct cgroup_subsys_state *css, int cpu);
85+
int (*css_extra_stat_show)(struct seq_file *seq,
86+
struct cgroup_subsys_state *css);
87+
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
88+
index c60ba0ab1462..893786eab081 100644
89+
--- a/include/linux/cgroup.h
90+
+++ b/include/linux/cgroup.h
91+
@@ -342,7 +342,7 @@ static inline u64 cgroup_id(const struct cgroup *cgrp)
92+
*/
93+
static inline bool css_is_dying(struct cgroup_subsys_state *css)
94+
{
95+
- return !(css->flags & CSS_NO_REF) && percpu_ref_is_dying(&css->refcnt);
96+
+ return css->flags & CSS_DYING;
97+
}
98+
99+
static inline void cgroup_get(struct cgroup *cgrp)
100+
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
101+
index 244ec600b4d8..2537dfa11a7e 100644
102+
--- a/kernel/cgroup/cgroup.c
103+
+++ b/kernel/cgroup/cgroup.c
104+
@@ -5902,6 +5902,12 @@ static void kill_css(struct cgroup_subsys_state *css)
105+
if (css->flags & CSS_DYING)
106+
return;
107+
108+
+ /*
109+
+ * Call css_killed(), if defined, before setting the CSS_DYING flag
110+
+ */
111+
+ if (css->ss->css_killed)
112+
+ css->ss->css_killed(css);
113+
+
114+
css->flags |= CSS_DYING;
115+
116+
/*
117+
* Unmerged path kernel/cgroup/cpuset.c

0 commit comments

Comments
 (0)