Skip to content

Commit 36df6e3

Browse files
shakeelbhtejun
authored andcommitted
cgroup: make css_rstat_updated nmi safe
To make css_rstat_updated() able to safely run in nmi context, let's move the rstat update tree creation at the flush side and use per-cpu lockless lists in struct cgroup_subsys to track the css whose stats are updated on that cpu. The struct cgroup_subsys_state now has per-cpu lnode which needs to be inserted into the corresponding per-cpu lhead of struct cgroup_subsys. Since we want the insertion to be nmi safe, there can be multiple inserters on the same cpu for the same lnode. Here multiple inserters are from stacked contexts like softirq, hardirq and nmi. The current llist does not provide function to protect against the scenario where multiple inserters can use the same lnode. So, using llist_node() out of the box is not safe for this scenario. However we can protect against multiple inserters using the same lnode by using the fact llist node points to itself when not on the llist and atomically reset it and select the winner as the single inserter. Signed-off-by: Shakeel Butt <[email protected]> Tested-by: JP Kobryn <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
1 parent 1257b87 commit 36df6e3

File tree

1 file changed

+53
-12
lines changed

1 file changed

+53
-12
lines changed

kernel/cgroup/rstat.c

Lines changed: 53 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -126,13 +126,16 @@ void _css_rstat_cpu_unlock(struct cgroup_subsys_state *css, int cpu,
126126
* @css: target cgroup subsystem state
127127
* @cpu: cpu on which rstat_cpu was updated
128128
*
129-
* @css's rstat_cpu on @cpu was updated. Put it on the parent's matching
130-
* rstat_cpu->updated_children list. See the comment on top of
131-
* css_rstat_cpu definition for details.
129+
* Atomically inserts the css in the ss's llist for the given cpu. This is
130+
* reentrant safe i.e. safe against softirq, hardirq and nmi. The ss's llist
131+
* will be processed at the flush time to create the update tree.
132132
*/
133133
__bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
134134
{
135-
unsigned long flags;
135+
struct llist_head *lhead;
136+
struct css_rstat_cpu *rstatc;
137+
struct css_rstat_cpu __percpu *rstatc_pcpu;
138+
struct llist_node *self;
136139

137140
/*
138141
* Since bpf programs can call this function, prevent access to
@@ -141,19 +144,44 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
141144
if (!css_uses_rstat(css))
142145
return;
143146

147+
lockdep_assert_preemption_disabled();
148+
149+
/*
150+
* For archs withnot nmi safe cmpxchg or percpu ops support, ignore
151+
* the requests from nmi context.
152+
*/
153+
if ((!IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) ||
154+
!IS_ENABLED(CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS)) && in_nmi())
155+
return;
156+
157+
rstatc = css_rstat_cpu(css, cpu);
158+
/* If already on list return. */
159+
if (llist_on_list(&rstatc->lnode))
160+
return;
161+
144162
/*
145-
* Speculative already-on-list test. This may race leading to
146-
* temporary inaccuracies, which is fine.
163+
* This function can be renentered by irqs and nmis for the same cgroup
164+
* and may try to insert the same per-cpu lnode into the llist. Note
165+
* that llist_add() does not protect against such scenarios.
147166
*
148-
* Because @parent's updated_children is terminated with @parent
149-
* instead of NULL, we can tell whether @css is on the list by
150-
* testing the next pointer for NULL.
167+
* To protect against such stacked contexts of irqs/nmis, we use the
168+
* fact that lnode points to itself when not on a list and then use
169+
* this_cpu_cmpxchg() to atomically set to NULL to select the winner
170+
* which will call llist_add(). The losers can assume the insertion is
171+
* successful and the winner will eventually add the per-cpu lnode to
172+
* the llist.
151173
*/
152-
if (data_race(css_rstat_cpu(css, cpu)->updated_next))
174+
self = &rstatc->lnode;
175+
rstatc_pcpu = css->rstat_cpu;
176+
if (this_cpu_cmpxchg(rstatc_pcpu->lnode.next, self, NULL) != self)
153177
return;
154178

155-
flags = _css_rstat_cpu_lock(css, cpu, true);
179+
lhead = ss_lhead_cpu(css->ss, cpu);
180+
llist_add(&rstatc->lnode, lhead);
181+
}
156182

183+
static void __css_process_update_tree(struct cgroup_subsys_state *css, int cpu)
184+
{
157185
/* put @css and all ancestors on the corresponding updated lists */
158186
while (true) {
159187
struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu);
@@ -179,8 +207,19 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
179207

180208
css = parent;
181209
}
210+
}
211+
212+
static void css_process_update_tree(struct cgroup_subsys *ss, int cpu)
213+
{
214+
struct llist_head *lhead = ss_lhead_cpu(ss, cpu);
215+
struct llist_node *lnode;
216+
217+
while ((lnode = llist_del_first_init(lhead))) {
218+
struct css_rstat_cpu *rstatc;
182219

183-
_css_rstat_cpu_unlock(css, cpu, flags, true);
220+
rstatc = container_of(lnode, struct css_rstat_cpu, lnode);
221+
__css_process_update_tree(rstatc->owner, cpu);
222+
}
184223
}
185224

186225
/**
@@ -288,6 +327,8 @@ static struct cgroup_subsys_state *css_rstat_updated_list(
288327

289328
flags = _css_rstat_cpu_lock(root, cpu, false);
290329

330+
css_process_update_tree(root->ss, cpu);
331+
291332
/* Return NULL if this subtree is not on-list */
292333
if (!rstatc->updated_next)
293334
goto unlock_ret;

0 commit comments

Comments
 (0)