Skip to content

Commit 8b57b11

Browse files
Dave ChinnerDarrick J. Wong
authored andcommitted
pcpcntrs: fix dying cpu summation race
In commit f689054 ("percpu_counter: add percpu_counter_sum_all interface") a race condition between a cpu dying and percpu_counter_sum() iterating online CPUs was identified. The solution was to iterate all possible CPUs for summation via percpu_counter_sum_all(). We recently had a percpu_counter_sum() call in XFS trip over this same race condition and it fired a debug assert because the filesystem was unmounting and the counter *should* be zero just before we destroy it. That was reported here: https://lore.kernel.org/linux-kernel/[email protected]/ likely as a result of running generic/648 which exercises filesystems in the presence of CPU online/offline events. The solution to use percpu_counter_sum_all() is an awful one. We use percpu counters and percpu_counter_sum() for accurate and reliable threshold detection for space management, so a summation race condition during these operations can result in overcommit of available space and that may result in filesystem shutdowns. As percpu_counter_sum_all() iterates all possible CPUs rather than just those online or even those present, the mask can include CPUs that aren't even installed in the machine, or in the case of machines that can hot-plug CPU capable nodes, even have physical sockets present in the machine. Fundamentally, this race condition is caused by the CPU being offlined being removed from the cpu_online_mask before the notifier that cleans up per-cpu state is run. Hence percpu_counter_sum() will not sum the count for a cpu currently being taken offline, regardless of whether the notifier has run or not. This is the root cause of the bug. The percpu counter notifier iterates all the registered counters, locks the counter and moves the percpu count to the global sum. This is serialised against other operations that move the percpu counter to the global sum as well as percpu_counter_sum() operations that sum the percpu counts while holding the counter lock. Hence the notifier is safe to run concurrently with sum operations, and the only thing we actually need to care about is that percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when there are no CPUs dying, it has no addition overhead except for a cpumask_or() operation. This change makes percpu_counter_sum() always do the right thing in the presence of CPU hot unplug events and makes percpu_counter_sum_all() unnecessary. This, in turn, means that filesystems like XFS, ext4, and btrfs don't have to work out when they should use percpu_counter_sum() vs percpu_counter_sum_all() in their space accounting algorithms Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
1 parent 1470afe commit 8b57b11

File tree

1 file changed

+12
-3
lines changed

1 file changed

+12
-3
lines changed

lib/percpu_counter.c

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ static s64 __percpu_counter_sum_mask(struct percpu_counter *fbc,
131131

132132
raw_spin_lock_irqsave(&fbc->lock, flags);
133133
ret = fbc->count;
134-
for_each_cpu(cpu, cpu_mask) {
134+
for_each_cpu_or(cpu, cpu_online_mask, cpu_mask) {
135135
s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
136136
ret += *pcount;
137137
}
@@ -141,11 +141,20 @@ static s64 __percpu_counter_sum_mask(struct percpu_counter *fbc,
141141

142142
/*
143143
* Add up all the per-cpu counts, return the result. This is a more accurate
144-
* but much slower version of percpu_counter_read_positive()
144+
* but much slower version of percpu_counter_read_positive().
145+
*
146+
* We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums
147+
* from CPUs that are in the process of being taken offline. Dying cpus have
148+
* been removed from the online mask, but may not have had the hotplug dead
149+
* notifier called to fold the percpu count back into the global counter sum.
150+
* By including dying CPUs in the iteration mask, we avoid this race condition
151+
* so __percpu_counter_sum() just does the right thing when CPUs are being taken
152+
* offline.
145153
*/
146154
s64 __percpu_counter_sum(struct percpu_counter *fbc)
147155
{
148-
return __percpu_counter_sum_mask(fbc, cpu_online_mask);
156+
157+
return __percpu_counter_sum_mask(fbc, cpu_dying_mask);
149158
}
150159
EXPORT_SYMBOL(__percpu_counter_sum);
151160

0 commit comments

Comments
 (0)