Skip to content

Commit 3d6448e

Browse files
fdmananakdave
authored andcommitted
btrfs: fix race between page release and a fast fsync
When releasing an extent map, done through the page release callback, we can race with an ongoing fast fsync and cause the fsync to miss a new extent and not log it. The steps for this to happen are the following: 1) A page is dirtied for some inode I; 2) Writeback for that page is triggered by a path other than fsync, for example by the system due to memory pressure; 3) When the ordered extent for the extent (a single 4K page) finishes, we unpin the corresponding extent map and set its generation to N, the current transaction's generation; 4) The btrfs_releasepage() callback is invoked by the system due to memory pressure for that no longer dirty page of inode I; 5) At the same time, some task calls fsync on inode I, joins transaction N, and at btrfs_log_inode() it sees that the inode does not have the full sync flag set, so we proceed with a fast fsync. But before we get into btrfs_log_changed_extents() and lock the inode's extent map tree: 6) Through btrfs_releasepage() we end up at try_release_extent_mapping() and we remove the extent map for the new 4Kb extent, because it is neither pinned anymore nor locked. By calling remove_extent_mapping(), we remove the extent map from the list of modified extents, since the extent map does not have the logging flag set. We unlock the inode's extent map tree; 7) The task doing the fast fsync now enters btrfs_log_changed_extents(), locks the inode's extent map tree and iterates its list of modified extents, which no longer has the 4Kb extent in it, so it does not log the extent; 8) The fsync finishes; 9) Before transaction N is committed, a power failure happens. After replaying the log, the 4K extent of inode I will be missing, since it was not logged due to the race with try_release_extent_mapping(). So fix this by teaching try_release_extent_mapping() to not remove an extent map if it's still in the list of modified extents. Fixes: ff44c6e ("Btrfs: do not hold the write_lock on the extent tree while logging") CC: [email protected] # 5.4+ Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 88c4703 commit 3d6448e

File tree

1 file changed

+13
-3
lines changed

1 file changed

+13
-3
lines changed

fs/btrfs/extent_io.c

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4494,15 +4494,25 @@ int try_release_extent_mapping(struct page *page, gfp_t mask)
44944494
free_extent_map(em);
44954495
break;
44964496
}
4497-
if (!test_range_bit(tree, em->start,
4498-
extent_map_end(em) - 1,
4499-
EXTENT_LOCKED, 0, NULL)) {
4497+
if (test_range_bit(tree, em->start,
4498+
extent_map_end(em) - 1,
4499+
EXTENT_LOCKED, 0, NULL))
4500+
goto next;
4501+
/*
4502+
* If it's not in the list of modified extents, used
4503+
* by a fast fsync, we can remove it. If it's being
4504+
* logged we can safely remove it since fsync took an
4505+
* extra reference on the em.
4506+
*/
4507+
if (list_empty(&em->list) ||
4508+
test_bit(EXTENT_FLAG_LOGGING, &em->flags)) {
45004509
set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
45014510
&btrfs_inode->runtime_flags);
45024511
remove_extent_mapping(map, em);
45034512
/* once for the rb tree */
45044513
free_extent_map(em);
45054514
}
4515+
next:
45064516
start = extent_map_end(em);
45074517
write_unlock(&map->lock);
45084518

0 commit comments

Comments
 (0)