Skip to content

Commit 9d230c0

Browse files
Waiman-Longaxboe
authored andcommitted
blk-cgroup: Properly propagate the iostat update up the hierarchy
During a cgroup_rstat_flush() call, the lowest level of nodes are flushed first before their parents. Since commit 3b8cc62 ("blk-cgroup: Optimize blkcg_rstat_flush()"), iostat propagation was still done to the parent. Grandparent, however, may not get the iostat update if the parent has no blkg_iostat_set queued in its lhead lockless list. Fix this iostat propagation problem by queuing the parent's global blkg->iostat into one of its percpu lockless lists to make sure that the delta will always be propagated up to the grandparent and so on toward the root blkcg. Note that successive calls to __blkcg_rstat_flush() are serialized by the cgroup_rstat_lock. So no special barrier is used in the reading and writing of blkg->iostat.lqueued. Fixes: 3b8cc62 ("blk-cgroup: Optimize blkcg_rstat_flush()") Reported-by: Dan Schatzberg <[email protected]> Closes: https://lore.kernel.org/lkml/ZkO6l%2FODzadSgdhC@dschatzberg-fedora-PF3DHTBV/ Signed-off-by: Waiman Long <[email protected]> Reviewed-by: Ming Lei <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
1 parent d0aac23 commit 9d230c0

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

block/blk-cgroup.c

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -322,6 +322,7 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct gendisk *disk,
322322
blkg->q = disk->queue;
323323
INIT_LIST_HEAD(&blkg->q_node);
324324
blkg->blkcg = blkcg;
325+
blkg->iostat.blkg = blkg;
325326
#ifdef CONFIG_BLK_CGROUP_PUNT_BIO
326327
spin_lock_init(&blkg->async_bio_lock);
327328
bio_list_init(&blkg->async_bios);
@@ -1046,6 +1047,8 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
10461047
smp_mb();
10471048

10481049
WRITE_ONCE(bisc->lqueued, false);
1050+
if (bisc == &blkg->iostat)
1051+
goto propagate_up; /* propagate up to parent only */
10491052

10501053
/* fetch the current per-cpu values */
10511054
do {
@@ -1055,10 +1058,24 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
10551058

10561059
blkcg_iostat_update(blkg, &cur, &bisc->last);
10571060

1061+
propagate_up:
10581062
/* propagate global delta to parent (unless that's root) */
1059-
if (parent && parent->parent)
1063+
if (parent && parent->parent) {
10601064
blkcg_iostat_update(parent, &blkg->iostat.cur,
10611065
&blkg->iostat.last);
1066+
/*
1067+
* Queue parent->iostat to its blkcg's lockless
1068+
* list to propagate up to the grandparent if the
1069+
* iostat hasn't been queued yet.
1070+
*/
1071+
if (!parent->iostat.lqueued) {
1072+
struct llist_head *plhead;
1073+
1074+
plhead = per_cpu_ptr(parent->blkcg->lhead, cpu);
1075+
llist_add(&parent->iostat.lnode, plhead);
1076+
parent->iostat.lqueued = true;
1077+
}
1078+
}
10621079
}
10631080
raw_spin_unlock_irqrestore(&blkg_stat_lock, flags);
10641081
out:

0 commit comments

Comments
 (0)