Skip to content

Commit 298f342

Browse files
Dave Chinnerdchinner
authored andcommitted
xfs: lockless buffer lookup
Now that we have a standalone fast path for buffer lookup, we can easily convert it to use rcu lookups. When we continually hammer the buffer cache with trylock lookups, we end up with a huge amount of lock contention on the per-ag buffer hash locks: - 92.71% 0.05% [kernel] [k] xfs_inodegc_worker - 92.67% xfs_inodegc_worker - 92.13% xfs_inode_unlink - 91.52% xfs_inactive_ifree - 85.63% xfs_read_agi - 85.61% xfs_trans_read_buf_map - 85.59% xfs_buf_read_map - xfs_buf_get_map - 85.55% xfs_buf_find - 72.87% _raw_spin_lock - do_raw_spin_lock 71.86% __pv_queued_spin_lock_slowpath - 8.74% xfs_buf_rele - 7.88% _raw_spin_lock - 7.88% do_raw_spin_lock 7.63% __pv_queued_spin_lock_slowpath - 1.70% xfs_buf_trylock - 1.68% down_trylock - 1.41% _raw_spin_lock_irqsave - 1.39% do_raw_spin_lock __pv_queued_spin_lock_slowpath - 0.76% _raw_spin_unlock 0.75% do_raw_spin_unlock This is basically hammering the pag->pag_buf_lock from lots of CPUs doing trylocks at the same time. Most of the buffer trylock operations ultimately fail after we've done the lookup, so we're really hammering the buf hash lock whilst making no progress. We can also see significant spinlock traffic on the same lock just under normal operation when lots of tasks are accessing metadata from the same AG, so let's avoid all this by converting the lookup fast path to leverages the rhashtable's ability to do rcu protected lookups. We avoid races with the buffer release path by using atomic_inc_not_zero() on the buffer hold count. Any buffer that is in the LRU will have a non-zero count, thereby allowing the lockless fast path to be taken in most cache hit situations. If the buffer hold count is zero, then it is likely going through the release path so in that case we fall back to the existing lookup miss slow path. The slow path will then do an atomic lookup and insert under the buffer hash lock and hence serialise correctly against buffer release freeing the buffer. The use of rcu protected lookups means that buffer handles now need to be freed by RCU callbacks (same as inodes). We still free the buffer pages before the RCU callback - we won't be trying to access them at all on a buffer that has zero references - but we need the buffer handle itself to be present for the entire rcu protected read side to detect a zero hold count correctly. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]>
1 parent 32dd4f9 commit 298f342

File tree

2 files changed

+16
-7
lines changed

2 files changed

+16
-7
lines changed

fs/xfs/xfs_buf.c

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -294,6 +294,16 @@ xfs_buf_free_pages(
294294
bp->b_flags &= ~_XBF_PAGES;
295295
}
296296

297+
static void
298+
xfs_buf_free_callback(
299+
struct callback_head *cb)
300+
{
301+
struct xfs_buf *bp = container_of(cb, struct xfs_buf, b_rcu);
302+
303+
xfs_buf_free_maps(bp);
304+
kmem_cache_free(xfs_buf_cache, bp);
305+
}
306+
297307
static void
298308
xfs_buf_free(
299309
struct xfs_buf *bp)
@@ -307,8 +317,7 @@ xfs_buf_free(
307317
else if (bp->b_flags & _XBF_KMEM)
308318
kmem_free(bp->b_addr);
309319

310-
xfs_buf_free_maps(bp);
311-
kmem_cache_free(xfs_buf_cache, bp);
320+
call_rcu(&bp->b_rcu, xfs_buf_free_callback);
312321
}
313322

314323
static int
@@ -567,14 +576,13 @@ xfs_buf_lookup(
567576
struct xfs_buf *bp;
568577
int error;
569578

570-
spin_lock(&pag->pag_buf_lock);
579+
rcu_read_lock();
571580
bp = rhashtable_lookup(&pag->pag_buf_hash, map, xfs_buf_hash_params);
572-
if (!bp) {
573-
spin_unlock(&pag->pag_buf_lock);
581+
if (!bp || !atomic_inc_not_zero(&bp->b_hold)) {
582+
rcu_read_unlock();
574583
return -ENOENT;
575584
}
576-
atomic_inc(&bp->b_hold);
577-
spin_unlock(&pag->pag_buf_lock);
585+
rcu_read_unlock();
578586

579587
error = xfs_buf_find_lock(bp, flags);
580588
if (error) {

fs/xfs/xfs_buf.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,7 @@ struct xfs_buf {
196196
int b_last_error;
197197

198198
const struct xfs_buf_ops *b_ops;
199+
struct rcu_head b_rcu;
199200
};
200201

201202
/* Finding and Reading Buffers */

0 commit comments

Comments
 (0)