Skip to content

Conversation

@opsiff
Copy link
Member

@opsiff opsiff commented Mar 26, 2025

The core of this patchset is the new folio_end_read() call which
filesystems can use when finishing a page cache read instead of separate
calls to mark the folio uptodate and unlock it. As an illustration of
its use, I converted ext4, iomap & mpage; more can be converted.

I think that's useful by itself, but the interesting optimisation is
that we can implement that with a single XOR instruction that sets the
uptodate bit, clears the lock bit, tests the waiter bit and provides a
write memory barrier. That removes one memory barrier and one atomic
instruction from each page read, which seems worth doing. That's in
patch 15.

The last two patches could be a separate series, but basically we can do
the same thing with the writeback flag that we do with the unlock flag;
clear it and test the waiters bit at the same time.

v2:

  • Update to 6.6-rc4
  • Simplify iomap's use of folio_end_read() as suggested by Linus
  • Fix weird Alpha assembly, as suggested by Linus
  • Implement xor_unlock_is_negative_byte for Coldfire
  • Add a likely() to folio_end_read() after studying the Coldfire assembly

Matthew Wilcox (Oracle) (17):
iomap: Hold state_lock over call to ifs_set_range_uptodate()
iomap: Protect read_bytes_pending with the state_lock
mm: Add folio_end_read()
ext4: Use folio_end_read()
buffer: Use folio_end_read()
iomap: Use folio_end_read()
bitops: Add xor_unlock_is_negative_byte()
alpha: Implement xor_unlock_is_negative_byte
m68k: Implement xor_unlock_is_negative_byte
mips: Implement xor_unlock_is_negative_byte
powerpc: Implement arch_xor_unlock_is_negative_byte on 32-bit
riscv: Implement xor_unlock_is_negative_byte
s390: Implement arch_xor_unlock_is_negative_byte
mm: Delete checks for xor_unlock_is_negative_byte()
mm: Add folio_xor_flags_has_waiters()
mm: Make __end_folio_writeback() return void
mm: Use folio_xor_flags_has_waiters() in folio_end_writeback()

Summary by Sourcery

Introduce folio_end_read() to simplify and optimize page cache read completion. This new function combines setting the uptodate bit, clearing the lock bit, and testing the waiter bit into a single operation, reducing memory barriers and atomic instructions. It also updates writeback flag handling.

New Features:

  • Introduce folio_end_read() to streamline page cache read completion.
  • Introduce folio_xor_flags_has_waiters() to change some folio flags and check for waiters

@sourcery-ai
Copy link

sourcery-ai bot commented Mar 26, 2025

Reviewer's Guide by Sourcery

This pull request introduces folio_end_read() to simplify page cache read completion, optimizes folio_end_writeback(), updates iomap to use spinlocks, and adds xor_unlock_is_negative_byte() for atomic XOR operations.

Sequence diagram for folio_end_read()

sequenceDiagram
    participant FS as FileSystem
    participant FM as Filemap

    FS->>FM: folio_end_read(folio, success)
    alt success is true
        FM->>FM: mask |= PG_uptodate
    end
    FM->>FM: folio_xor_flags_has_waiters(folio, mask)
    alt waiters exist
        FM->>FM: folio_wake_bit(folio, PG_locked)
    end
Loading

Sequence diagram for iomap_finish_folio_read()

sequenceDiagram
    participant IOM as iomap
    participant FM as Filemap

    IOM->>IOM: spin_lock_irqsave(&ifs->state_lock, flags)
    alt !error
        IOM->>IOM: uptodate = ifs_set_range_uptodate(folio, ifs, off, len)
    end
    IOM->>IOM: ifs->read_bytes_pending -= len
    IOM->>IOM: finished = !ifs->read_bytes_pending
    IOM->>IOM: spin_unlock_irqrestore(&ifs->state_lock, flags)
    alt error
        IOM->>FM: folio_set_error(folio)
    end
    alt finished
        IOM->>FM: folio_end_read(folio, uptodate)
    end
Loading

Updated class diagram for iomap_folio_state

classDiagram
    class iomap_folio_state {
        -spinlock_t state_lock
        -unsigned int read_bytes_pending
        -atomic_t write_bytes_pending
        -unsigned long state
    }
    note for iomap_folio_state "read_bytes_pending changed from atomic_t to unsigned int and is now protected by a spinlock"
Loading

File-Level Changes

Change Details Files
Introduces folio_end_read() to streamline page cache read completion, replacing separate folio_mark_uptodate() and folio_unlock() calls with a single function.
  • Adds folio_end_read() function.
  • Updates filesystems (ext4, iomap, buffer) to use folio_end_read().
  • Optimizes by using a single XOR instruction to set the uptodate bit, clear the lock bit, and test the waiter bit, reducing memory barriers and atomic instructions.
  • Adds folio_xor_flags_has_waiters() to check for waiters during flag manipulation.
mm/filemap.c
fs/iomap/buffered-io.c
mm/page-writeback.c
include/asm-generic/bitops/instrumented-lock.h
arch/mips/include/asm/bitops.h
arch/m68k/include/asm/bitops.h
arch/powerpc/include/asm/bitops.h
arch/alpha/include/asm/bitops.h
include/asm-generic/bitops/lock.h
include/linux/page-flags.h
fs/buffer.c
arch/mips/lib/bitops.c
fs/ext4/readpage.c
arch/riscv/include/asm/bitops.h
arch/x86/include/asm/bitops.h
arch/s390/include/asm/bitops.h
kernel/kcsan/kcsan_test.c
kernel/kcsan/selftest.c
mm/kasan/kasan_test.c
include/linux/pagemap.h
Optimizes folio_end_writeback() by using folio_xor_flags_has_waiters() to clear the writeback flag and test for waiters simultaneously.
  • Modifies folio_end_writeback() to use folio_xor_flags_has_waiters().
  • Removes folio_test_clear_writeback() call.
  • Makes __end_folio_writeback() return void and return a boolean indicating whether there are waiters.
mm/filemap.c
mm/page-writeback.c
Updates iomap to use spinlocks to protect read_bytes_pending and state bitmap.
  • Replaces atomic_t for read_bytes_pending with an unsigned int protected by a spinlock.
  • Holds state_lock over call to ifs_set_range_uptodate().
  • Modifies ifs_set_range_uptodate to return a boolean indicating whether the folio is fully uptodate.
fs/iomap/buffered-io.c
Introduces xor_unlock_is_negative_byte() and architecture-specific implementations for atomic XOR and negative byte test.
  • Adds xor_unlock_is_negative_byte() function.
  • Implements xor_unlock_is_negative_byte for alpha, m68k, mips, powerpc, riscv, and s390 architectures.
  • Removes clear_bit_unlock_is_negative_byte.
bitops: Add xor_unlock_is_negative_byte()
alpha: Implement xor_unlock_is_negative_byte
m68k: Implement xor_unlock_is_negative_byte
mips: Implement xor_unlock_is_negative_byte
powerpc: Implement arch_xor_unlock_is_negative_byte on 32-bit
riscv: Implement xor_unlock_is_negative_byte
s390: Implement arch_xor_unlock_is_negative_byte
mm: Delete checks for xor_unlock_is_negative_byte()
include/asm-generic/bitops/instrumented-lock.h
arch/mips/include/asm/bitops.h
arch/m68k/include/asm/bitops.h
arch/powerpc/include/asm/bitops.h
arch/alpha/include/asm/bitops.h
include/asm-generic/bitops/lock.h
arch/riscv/include/asm/bitops.h
arch/x86/include/asm/bitops.h
arch/s390/include/asm/bitops.h
arch/mips/lib/bitops.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from opsiff. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@opsiff opsiff force-pushed the linux-6.6.y-2025-03-26-folio branch from d84ebb0 to 32935e7 Compare March 26, 2025 04:37
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @opsiff - I've reviewed your changes - here's some feedback:

Overall Comments:

  • The removal of folio_wake seems like it might be premature - is it really unused?
  • It looks like you're converting ifs->read_bytes_pending from an atomic to a regular integer, but still using a spinlock - is there a reason it can't be a regular integer without a spinlock?
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 2 issues found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

return res;
}

static inline bool xor_unlock_is_negative_byte(unsigned long mask,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider factoring out the conditional and inline assembly duplication into a helper function to improve code reuse and readability.

Consider factoring out the conditional and inline assembly duplication into a helper that picks the “mask” at a single point. For example:

```c
static inline unsigned long xor_unlock_atomic(volatile unsigned long *p,
                                              unsigned long effective_mask)
{
    unsigned long orig;
    if (!kernel_uses_llsc)
        orig = __mips_xor_is_negative_byte(effective_mask, p);
    else
        orig = __test_bit_op(*p, "%0", "xor\t%1, %0, %3", "ir"(effective_mask));
    return orig;
}

static inline bool xor_unlock_is_negative_byte(unsigned long mask,
                                                 volatile unsigned long *p)
{
    unsigned long eff_mask = kernel_uses_llsc ? BIT(7) : mask;
    unsigned long orig;

    smp_mb__before_atomic();
    orig = xor_unlock_atomic(p, eff_mask);
    smp_llsc_mb();

    return (orig & eff_mask) != 0;
}

This consolidates the conditional logic and reduces duplicate inline assembly while retaining original functionality.

#define arch_test_bit generic_test_bit
#define arch_test_bit_acquire generic_test_bit_acquire

static inline bool xor_unlock_is_negative_byte(unsigned long mask,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider using GNU atomic builtins to replace the inline assembly loop with a single atomic XOR operation, if the platform supports it, to reduce complexity and duplicate logic while preserving functionality .

Consider using the GNU atomic builtins to simplify the inline assembly loop. For example, if the platform supports __atomic builtins, you can replace the inline assembly with a single atomic XOR operation:

static inline bool xor_unlock_is_negative_byte(unsigned long mask,
                                                 volatile unsigned long *p)
{
    unsigned long old = __atomic_fetch_xor(p, mask, __ATOMIC_ACQ_REL);
    return (old & BIT(7)) != 0;
}

This preserves the functionality while reducing the low-level complexity and duplicate logic.

@deepin-ci-robot
Copy link

deepin pr auto review

代码审查意见:

  1. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  2. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  3. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  4. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  5. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  6. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  7. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  8. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  9. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  10. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  11. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  12. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  13. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  14. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  15. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  16. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  17. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  18. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  19. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  20. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  21. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  22. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  23. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  24. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  25. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  26. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  27. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  28. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  29. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  30. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  31. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  32. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  33. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  34. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  35. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  36. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  37. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  38. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  39. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  40. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  41. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  42. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  43. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  44. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  45. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  46. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  47. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  48. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  49. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  50. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  51. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  52. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  53. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  54. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  55. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  56. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  57. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这会导致mask的值是128,这可能会影响函数的行为。建议检查PG_locked的值是否正确。

  58. folio_end_read函数中,folio_xor_flags_has_waiters函数的参数mask被设置为1 << PG_locked,但是PG_locked的值是7,这

Matthew Wilcox (Oracle) added 17 commits September 15, 2025 15:08
mainline inclusion
from mainline-v6.7-rc1
category: performance

Patch series "Add folio_end_read", v2.

The core of this patchset is the new folio_end_read() call which
filesystems can use when finishing a page cache read instead of separate
calls to mark the folio uptodate and unlock it.  As an illustration of its
use, I converted ext4, iomap & mpage; more can be converted.

I think that's useful by itself, but the interesting optimisation is that
we can implement that with a single XOR instruction that sets the uptodate
bit, clears the lock bit, tests the waiter bit and provides a write memory
barrier.  That removes one memory barrier and one atomic instruction from
each page read, which seems worth doing.  That's in patch 15.

The last two patches could be a separate series, but basically we can do
the same thing with the writeback flag that we do with the unlock flag;
clear it and test the waiters bit at the same time.

This patch (of 17):

This is really preparation for the next patch, but it lets us call
folio_mark_uptodate() in just one place instead of two.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 279d5fc)
Signed-off-by: Wentao Guan <[email protected]>
mainline inlcusion
from mainline-v6.7-rc1
category: performance

Perform one atomic operation (acquiring the spinlock) instead of two
(spinlock & atomic_sub) per read completion.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit f45b494)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Provide a function for filesystems to call when they have finished reading
an entire folio.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 0b23704)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

folio_end_read() is the perfect fit for ext4.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit f8174a1)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

There are two places that we can use this new helper.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 6ba924d)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Combine the setting of the uptodate flag with the clearing of the locked
flag.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 7a4847e)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Replace clear_bit_and_unlock_is_negative_byte() with
xor_unlock_is_negative_byte().  We have a few places that like to lock a
folio, set a flag and unlock it again.  Allow for the possibility of
combining the latter two operations for efficiency.  We are guaranteed
that the caller holds the lock, so it is safe to unlock it with the xor.
The caller must guarantee that nobody else will set the flag without
holding the lock; it is not safe to do this with the PG_dirty flag, for
example.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 247dbcd)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Inspired by the alpha clear_bit() and arch_atomic_add_return(), this will
surely be more efficient than the generic one defined in filemap.c.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit e28ff5d)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Using EOR to clear the guaranteed-to-be-set lock bit will test the
negative flag just like the x86 implementation.  This should be more
efficient than the generic implementation in filemap.c.  It would be
better if m68k had __GCC_ASM_FLAG_OUTPUTS__.

Coldfire doesn't have a byte-sized EOR, so we test bit 7 after the EOR,
which is a second memory access, but it's slightly better than the current
C code.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit ea845e3)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Inspired by the mips test_and_change_bit(), this will surely be more
efficient than the generic one defined in filemap.c

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 8da36b2)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Simply remove the ifdef.  The assembly is identical to that in the
non-optimised case of test_and_clear_bits() on PPC32, and it's not clear
to me how the PPC32 optimisation works, nor whether it would work for
arch_xor_unlock_is_negative_byte().  If that optimisation would work,
someone can implement it later, but this is more efficient than the
implementation in filemap.c.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 51a752c)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Inspired by the riscv clear_bit_unlock(), this will surely be
more efficient than the generic one defined in filemap.c.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 2a66728)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Inspired by the s390 arch_test_and_clear_bit(), this will surely be more
efficient than the generic one defined in filemap.c.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 12010aa)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Architectures which don't define their own use the one in
asm-generic/bitops/lock.h.  Get rid of all the ifdefs around "maybe we
don't have it".

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit f12fb73)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Optimise folio_end_read() by setting the uptodate bit at the same time we
clear the unlock bit.  This saves at least one memory barrier and one
write-after-write hazard.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 0410cd8)
Signed-off-by: Wentao Guan <[email protected]>
mainline incusion
from mainline-v6.7-rc1
category: performance

Rather than check the result of test-and-clear, just check that we have
the writeback bit set at the start.  This wouldn't catch every case, but
it's good enough (and enables the next patch).

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 7d0795d)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

Match how folio_unlock() works by combining the test for PG_waiters with
the clearing of PG_writeback.  This should have a small performance win,
and removes the last user of folio_wake().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 2580d55)
Signed-off-by: Wentao Guan <[email protected]>
@opsiff opsiff force-pushed the linux-6.6.y-2025-03-26-folio branch from 32935e7 to b426260 Compare September 15, 2025 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants