Skip to content

Commit ab73b29

Browse files
davidhildenbrandClaudio Imbrenda
authored andcommitted
s390/uv: Improve splitting of large folios that cannot be split while dirty
Currently, starting a PV VM on an iomap-based filesystem with large folio support, such as XFS, will not work. We'll be stuck in unpack_one()->gmap_make_secure(), because we can't seem to make progress splitting the large folio. The problem is that we require a writable PTE but a writable PTE under such filesystems will imply a dirty folio. So whenever we have a writable PTE, we'll have a dirty folio, and dirty iomap folios cannot currently get split, because split_folio()->split_huge_page_to_list_to_order()->filemap_release_folio() will fail in iomap_release_folio(). So we will not make any progress splitting such large folios. Until dirty folios can be split more reliably, let's manually trigger writeback of the problematic folio using filemap_write_and_wait_range(), and retry the split immediately afterwards exactly once, before looking up the folio again. Should this logic be part of split_folio()? Likely not; most split users don't have to split so eagerly to make any progress. For now, this seems to affect xfs, zonefs and erofs, and this patch makes it work again (tested on xfs only). While this could be considered a fix for commit 6795801 ("xfs: Support large folios"), commit df2f970 ("zonefs: enable support for large folios") and commit ce529cc ("erofs: enable large folios for iomap mode"), before commit eef88fe ("s390/uv: Split large folios in gmap_make_secure()"), we did not try splitting large folios at all. So it's all rather part of making SE compatible with file systems that support large folios. But to have some "Fixes:" tag, let's just use eef88fe. Not CCing stable, because there are a lot of dependencies, and it simply not working is not critical in stable kernels. Reported-by: Sebastian Mitterle <[email protected]> Closes: https://issues.redhat.com/browse/RHEL-58218 Fixes: eef88fe ("s390/uv: Split large folios in gmap_make_secure()") Signed-off-by: David Hildenbrand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-ID: <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Signed-off-by: Claudio Imbrenda <[email protected]>
1 parent bd428b8 commit ab73b29

File tree

1 file changed

+60
-6
lines changed
  • arch/s390/kernel

1 file changed

+60
-6
lines changed

arch/s390/kernel/uv.c

Lines changed: 60 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
#include <linux/pagemap.h>
1616
#include <linux/swap.h>
1717
#include <linux/pagewalk.h>
18+
#include <linux/backing-dev.h>
1819
#include <asm/facility.h>
1920
#include <asm/sections.h>
2021
#include <asm/uv.h>
@@ -338,22 +339,75 @@ static int make_folio_secure(struct mm_struct *mm, struct folio *folio, struct u
338339
*/
339340
static int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio)
340341
{
341-
int rc;
342+
int rc, tried_splits;
342343

343344
lockdep_assert_not_held(&mm->mmap_lock);
344345
folio_wait_writeback(folio);
345346
lru_add_drain_all();
346347

347-
if (folio_test_large(folio)) {
348+
if (!folio_test_large(folio))
349+
return 0;
350+
351+
for (tried_splits = 0; tried_splits < 2; tried_splits++) {
352+
struct address_space *mapping;
353+
loff_t lstart, lend;
354+
struct inode *inode;
355+
348356
folio_lock(folio);
349357
rc = split_folio(folio);
358+
if (rc != -EBUSY) {
359+
folio_unlock(folio);
360+
return rc;
361+
}
362+
363+
/*
364+
* Splitting with -EBUSY can fail for various reasons, but we
365+
* have to handle one case explicitly for now: some mappings
366+
* don't allow for splitting dirty folios; writeback will
367+
* mark them clean again, including marking all page table
368+
* entries mapping the folio read-only, to catch future write
369+
* attempts.
370+
*
371+
* While the system should be writing back dirty folios in the
372+
* background, we obtained this folio by looking up a writable
373+
* page table entry. On these problematic mappings, writable
374+
* page table entries imply dirty folios, preventing the
375+
* split in the first place.
376+
*
377+
* To prevent a livelock when trigger writeback manually and
378+
* letting the caller look up the folio again in the page
379+
* table (turning it dirty), immediately try to split again.
380+
*
381+
* This is only a problem for some mappings (e.g., XFS);
382+
* mappings that do not support writeback (e.g., shmem) do not
383+
* apply.
384+
*/
385+
if (!folio_test_dirty(folio) || folio_test_anon(folio) ||
386+
!folio->mapping || !mapping_can_writeback(folio->mapping)) {
387+
folio_unlock(folio);
388+
break;
389+
}
390+
391+
/*
392+
* Ideally, we'd only trigger writeback on this exact folio. But
393+
* there is no easy way to do that, so we'll stabilize the
394+
* mapping while we still hold the folio lock, so we can drop
395+
* the folio lock to trigger writeback on the range currently
396+
* covered by the folio instead.
397+
*/
398+
mapping = folio->mapping;
399+
lstart = folio_pos(folio);
400+
lend = lstart + folio_size(folio) - 1;
401+
inode = igrab(mapping->host);
350402
folio_unlock(folio);
351403

352-
if (rc != -EBUSY)
353-
return rc;
354-
return -EAGAIN;
404+
if (unlikely(!inode))
405+
break;
406+
407+
filemap_write_and_wait_range(mapping, lstart, lend);
408+
iput(mapping->host);
355409
}
356-
return 0;
410+
return -EAGAIN;
357411
}
358412

359413
int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header *uvcb)

0 commit comments

Comments
 (0)