-
Notifications
You must be signed in to change notification settings - Fork 103
[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] Add folio_end_read #690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: linux-6.6.y
Are you sure you want to change the base?
[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] Add folio_end_read #690
Conversation
Reviewer's Guide by SourceryThis pull request introduces Sequence diagram for folio_end_read()sequenceDiagram
participant FS as FileSystem
participant FM as Filemap
FS->>FM: folio_end_read(folio, success)
alt success is true
FM->>FM: mask |= PG_uptodate
end
FM->>FM: folio_xor_flags_has_waiters(folio, mask)
alt waiters exist
FM->>FM: folio_wake_bit(folio, PG_locked)
end
Sequence diagram for iomap_finish_folio_read()sequenceDiagram
participant IOM as iomap
participant FM as Filemap
IOM->>IOM: spin_lock_irqsave(&ifs->state_lock, flags)
alt !error
IOM->>IOM: uptodate = ifs_set_range_uptodate(folio, ifs, off, len)
end
IOM->>IOM: ifs->read_bytes_pending -= len
IOM->>IOM: finished = !ifs->read_bytes_pending
IOM->>IOM: spin_unlock_irqrestore(&ifs->state_lock, flags)
alt error
IOM->>FM: folio_set_error(folio)
end
alt finished
IOM->>FM: folio_end_read(folio, uptodate)
end
Updated class diagram for iomap_folio_stateclassDiagram
class iomap_folio_state {
-spinlock_t state_lock
-unsigned int read_bytes_pending
-atomic_t write_bytes_pending
-unsigned long state
}
note for iomap_folio_state "read_bytes_pending changed from atomic_t to unsigned int and is now protected by a spinlock"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d84ebb0 to
32935e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @opsiff - I've reviewed your changes - here's some feedback:
Overall Comments:
- The removal of
folio_wakeseems like it might be premature - is it really unused? - It looks like you're converting
ifs->read_bytes_pendingfrom an atomic to a regular integer, but still using a spinlock - is there a reason it can't be a regular integer without a spinlock?
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 2 issues found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| return res; | ||
| } | ||
|
|
||
| static inline bool xor_unlock_is_negative_byte(unsigned long mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider factoring out the conditional and inline assembly duplication into a helper function to improve code reuse and readability.
Consider factoring out the conditional and inline assembly duplication into a helper that picks the “mask” at a single point. For example:
```c
static inline unsigned long xor_unlock_atomic(volatile unsigned long *p,
unsigned long effective_mask)
{
unsigned long orig;
if (!kernel_uses_llsc)
orig = __mips_xor_is_negative_byte(effective_mask, p);
else
orig = __test_bit_op(*p, "%0", "xor\t%1, %0, %3", "ir"(effective_mask));
return orig;
}
static inline bool xor_unlock_is_negative_byte(unsigned long mask,
volatile unsigned long *p)
{
unsigned long eff_mask = kernel_uses_llsc ? BIT(7) : mask;
unsigned long orig;
smp_mb__before_atomic();
orig = xor_unlock_atomic(p, eff_mask);
smp_llsc_mb();
return (orig & eff_mask) != 0;
}This consolidates the conditional logic and reduces duplicate inline assembly while retaining original functionality.
| #define arch_test_bit generic_test_bit | ||
| #define arch_test_bit_acquire generic_test_bit_acquire | ||
|
|
||
| static inline bool xor_unlock_is_negative_byte(unsigned long mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider using GNU atomic builtins to replace the inline assembly loop with a single atomic XOR operation, if the platform supports it, to reduce complexity and duplicate logic while preserving functionality .
Consider using the GNU atomic builtins to simplify the inline assembly loop. For example, if the platform supports __atomic builtins, you can replace the inline assembly with a single atomic XOR operation:
static inline bool xor_unlock_is_negative_byte(unsigned long mask,
volatile unsigned long *p)
{
unsigned long old = __atomic_fetch_xor(p, mask, __ATOMIC_ACQ_REL);
return (old & BIT(7)) != 0;
}This preserves the functionality while reducing the low-level complexity and duplicate logic.
deepin pr auto review代码审查意见:
|
mainline inclusion from mainline-v6.7-rc1 category: performance Patch series "Add folio_end_read", v2. The core of this patchset is the new folio_end_read() call which filesystems can use when finishing a page cache read instead of separate calls to mark the folio uptodate and unlock it. As an illustration of its use, I converted ext4, iomap & mpage; more can be converted. I think that's useful by itself, but the interesting optimisation is that we can implement that with a single XOR instruction that sets the uptodate bit, clears the lock bit, tests the waiter bit and provides a write memory barrier. That removes one memory barrier and one atomic instruction from each page read, which seems worth doing. That's in patch 15. The last two patches could be a separate series, but basically we can do the same thing with the writeback flag that we do with the unlock flag; clear it and test the waiters bit at the same time. This patch (of 17): This is really preparation for the next patch, but it lets us call folio_mark_uptodate() in just one place instead of two. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 279d5fc) Signed-off-by: Wentao Guan <[email protected]>
mainline inlcusion from mainline-v6.7-rc1 category: performance Perform one atomic operation (acquiring the spinlock) instead of two (spinlock & atomic_sub) per read completion. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit f45b494) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Provide a function for filesystems to call when they have finished reading an entire folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 0b23704) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance folio_end_read() is the perfect fit for ext4. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit f8174a1) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance There are two places that we can use this new helper. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 6ba924d) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Combine the setting of the uptodate flag with the clearing of the locked flag. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 7a4847e) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Replace clear_bit_and_unlock_is_negative_byte() with xor_unlock_is_negative_byte(). We have a few places that like to lock a folio, set a flag and unlock it again. Allow for the possibility of combining the latter two operations for efficiency. We are guaranteed that the caller holds the lock, so it is safe to unlock it with the xor. The caller must guarantee that nobody else will set the flag without holding the lock; it is not safe to do this with the PG_dirty flag, for example. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 247dbcd) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Inspired by the alpha clear_bit() and arch_atomic_add_return(), this will surely be more efficient than the generic one defined in filemap.c. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit e28ff5d) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Using EOR to clear the guaranteed-to-be-set lock bit will test the negative flag just like the x86 implementation. This should be more efficient than the generic implementation in filemap.c. It would be better if m68k had __GCC_ASM_FLAG_OUTPUTS__. Coldfire doesn't have a byte-sized EOR, so we test bit 7 after the EOR, which is a second memory access, but it's slightly better than the current C code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit ea845e3) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Inspired by the mips test_and_change_bit(), this will surely be more efficient than the generic one defined in filemap.c Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 8da36b2) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Simply remove the ifdef. The assembly is identical to that in the non-optimised case of test_and_clear_bits() on PPC32, and it's not clear to me how the PPC32 optimisation works, nor whether it would work for arch_xor_unlock_is_negative_byte(). If that optimisation would work, someone can implement it later, but this is more efficient than the implementation in filemap.c. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 51a752c) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Inspired by the riscv clear_bit_unlock(), this will surely be more efficient than the generic one defined in filemap.c. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 2a66728) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Inspired by the s390 arch_test_and_clear_bit(), this will surely be more efficient than the generic one defined in filemap.c. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 12010aa) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Architectures which don't define their own use the one in asm-generic/bitops/lock.h. Get rid of all the ifdefs around "maybe we don't have it". Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit f12fb73) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Optimise folio_end_read() by setting the uptodate bit at the same time we clear the unlock bit. This saves at least one memory barrier and one write-after-write hazard. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 0410cd8) Signed-off-by: Wentao Guan <[email protected]>
mainline incusion from mainline-v6.7-rc1 category: performance Rather than check the result of test-and-clear, just check that we have the writeback bit set at the start. This wouldn't catch every case, but it's good enough (and enables the next patch). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 7d0795d) Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion from mainline-v6.7-rc1 category: performance Match how folio_unlock() works by combining the test for PG_waiters with the clearing of PG_writeback. This should have a small performance win, and removes the last user of folio_wake(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 2580d55) Signed-off-by: Wentao Guan <[email protected]>
32935e7 to
b426260
Compare
The core of this patchset is the new folio_end_read() call which
filesystems can use when finishing a page cache read instead of separate
calls to mark the folio uptodate and unlock it. As an illustration of
its use, I converted ext4, iomap & mpage; more can be converted.
I think that's useful by itself, but the interesting optimisation is
that we can implement that with a single XOR instruction that sets the
uptodate bit, clears the lock bit, tests the waiter bit and provides a
write memory barrier. That removes one memory barrier and one atomic
instruction from each page read, which seems worth doing. That's in
patch 15.
The last two patches could be a separate series, but basically we can do
the same thing with the writeback flag that we do with the unlock flag;
clear it and test the waiters bit at the same time.
v2:
Matthew Wilcox (Oracle) (17):
iomap: Hold state_lock over call to ifs_set_range_uptodate()
iomap: Protect read_bytes_pending with the state_lock
mm: Add folio_end_read()
ext4: Use folio_end_read()
buffer: Use folio_end_read()
iomap: Use folio_end_read()
bitops: Add xor_unlock_is_negative_byte()
alpha: Implement xor_unlock_is_negative_byte
m68k: Implement xor_unlock_is_negative_byte
mips: Implement xor_unlock_is_negative_byte
powerpc: Implement arch_xor_unlock_is_negative_byte on 32-bit
riscv: Implement xor_unlock_is_negative_byte
s390: Implement arch_xor_unlock_is_negative_byte
mm: Delete checks for xor_unlock_is_negative_byte()
mm: Add folio_xor_flags_has_waiters()
mm: Make __end_folio_writeback() return void
mm: Use folio_xor_flags_has_waiters() in folio_end_writeback()
Summary by Sourcery
Introduce folio_end_read() to simplify and optimize page cache read completion. This new function combines setting the uptodate bit, clearing the lock bit, and testing the waiter bit into a single operation, reducing memory barriers and atomic instructions. It also updates writeback flag handling.
New Features: