Skip to content

Commit 94c78cf

Browse files
Christoph Hellwigcmaiolino
authored andcommitted
xfs: convert buffer cache to use high order folios
Now that we have the buffer cache using the folio API, we can extend the use of folios to allocate high order folios for multi-page buffers rather than an array of single pages that are then vmapped into a contiguous range. This creates a new type of single folio buffers that can have arbitrary order in addition to the existing multi-folio buffers made up of many single page folios that get vmapped. The single folio is for now stashed into the existing b_pages array, but that will go away entirely later in the series and remove the temporary page vs folio typing issues that only work because the two structures currently can be used largely interchangeable. The code that allocates buffers will optimistically attempt a high order folio allocation as a fast path if the buffer size is a power of two and thus fits into a folio. If this high order allocation fails, then we fall back to the existing multi-folio allocation code. This now forms the slow allocation path, and hopefully will be largely unused in normal conditions except for buffers with size that are not a power of two like larger remote xattrs. This should improve performance of large buffer operations (e.g. large directory block sizes) as we should now mostly avoid the expense of vmapping large buffers (and the vmap lock contention that can occur) as well as avoid the runtime pressure that frequently accessing kernel vmapped pages put on the TLBs. Based on a patch from Dave Chinner <[email protected]>, but mutilated beyond recognition. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
1 parent 4ef3982 commit 94c78cf

File tree

1 file changed

+50
-6
lines changed

1 file changed

+50
-6
lines changed

fs/xfs/xfs_buf.c

Lines changed: 50 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -203,9 +203,9 @@ xfs_buf_free_pages(
203203

204204
for (i = 0; i < bp->b_page_count; i++) {
205205
if (bp->b_pages[i])
206-
__free_page(bp->b_pages[i]);
206+
folio_put(page_folio(bp->b_pages[i]));
207207
}
208-
mm_account_reclaimed_pages(bp->b_page_count);
208+
mm_account_reclaimed_pages(howmany(BBTOB(bp->b_length), PAGE_SIZE));
209209

210210
if (bp->b_pages != bp->b_page_array)
211211
kfree(bp->b_pages);
@@ -277,12 +277,17 @@ xfs_buf_alloc_kmem(
277277
* For tmpfs-backed buffers used by in-memory btrees this directly maps the
278278
* tmpfs page cache folios.
279279
*
280-
* For real file system buffers there are two different kinds backing memory:
280+
* For real file system buffers there are three different kinds backing memory:
281281
*
282282
* The first type backs the buffer by a kmalloc allocation. This is done for
283283
* less than PAGE_SIZE allocations to avoid wasting memory.
284284
*
285-
* The second type of buffer is the multi-page buffer. These are always made
285+
* The second type is a single folio buffer - this may be a high order folio or
286+
* just a single page sized folio, but either way they get treated the same way
287+
* by the rest of the code - the buffer memory spans a single contiguous memory
288+
* region that we don't have to map and unmap to access the data directly.
289+
*
290+
* The third type of buffer is the multi-page buffer. These are always made
286291
* up of single pages so that they can be fed to vmap_ram() to return a
287292
* contiguous memory region we can access the data through, or mark it as
288293
* XBF_UNMAPPED and access the data directly through individual page_address()
@@ -295,6 +300,7 @@ xfs_buf_alloc_backing_mem(
295300
{
296301
size_t size = BBTOB(bp->b_length);
297302
gfp_t gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOWARN;
303+
struct folio *folio;
298304
long filled = 0;
299305

300306
if (xfs_buftarg_is_mem(bp->b_target))
@@ -316,7 +322,45 @@ xfs_buf_alloc_backing_mem(
316322
if (size < PAGE_SIZE && is_power_of_2(size))
317323
return xfs_buf_alloc_kmem(bp, size, gfp_mask);
318324

319-
/* Make sure that we have a page list */
325+
/*
326+
* Don't bother with the retry loop for single PAGE allocations: vmalloc
327+
* won't do any better.
328+
*/
329+
if (size <= PAGE_SIZE)
330+
gfp_mask |= __GFP_NOFAIL;
331+
332+
/*
333+
* Optimistically attempt a single high order folio allocation for
334+
* larger than PAGE_SIZE buffers.
335+
*
336+
* Allocating a high order folio makes the assumption that buffers are a
337+
* power-of-2 size, matching the power-of-2 folios sizes available.
338+
*
339+
* The exception here are user xattr data buffers, which can be arbitrarily
340+
* sized up to 64kB plus structure metadata, skip straight to the vmalloc
341+
* path for them instead of wasting memory here.
342+
*/
343+
if (size > PAGE_SIZE) {
344+
if (!is_power_of_2(size))
345+
goto fallback;
346+
gfp_mask &= ~__GFP_DIRECT_RECLAIM;
347+
gfp_mask |= __GFP_NORETRY;
348+
}
349+
folio = folio_alloc(gfp_mask, get_order(size));
350+
if (!folio) {
351+
if (size <= PAGE_SIZE)
352+
return -ENOMEM;
353+
goto fallback;
354+
}
355+
bp->b_addr = folio_address(folio);
356+
bp->b_page_array[0] = &folio->page;
357+
bp->b_pages = bp->b_page_array;
358+
bp->b_page_count = 1;
359+
bp->b_flags |= _XBF_PAGES;
360+
return 0;
361+
362+
fallback:
363+
/* Fall back to allocating an array of single page folios. */
320364
bp->b_page_count = DIV_ROUND_UP(size, PAGE_SIZE);
321365
if (bp->b_page_count <= XB_PAGES) {
322366
bp->b_pages = bp->b_page_array;
@@ -1474,7 +1518,7 @@ xfs_buf_submit_bio(
14741518
bio->bi_private = bp;
14751519
bio->bi_end_io = xfs_buf_bio_end_io;
14761520

1477-
if (bp->b_flags & _XBF_KMEM) {
1521+
if (bp->b_page_count == 1) {
14781522
__bio_add_page(bio, virt_to_page(bp->b_addr), size,
14791523
offset_in_page(bp->b_addr));
14801524
} else {

0 commit comments

Comments
 (0)