Skip to content

Commit db21370

Browse files
fdmananakdave
authored andcommitted
btrfs: drop extent map range more efficiently
Currently when dropping extent maps for a file range, through btrfs_drop_extent_map_range(), we do the following non-optimal things: 1) We lookup for extent maps one by one, always starting the search from the root of the extent map tree. This is not efficient if we have multiple extent maps in the range; 2) We check on every iteration if we have the 'split' and 'split2' spare extent maps in case we need to split an extent map that intersects our range but also crosses its boundaries (to the left, to the right or both cases). If our target range is for example: [2M, 8M) And we have 3 extents maps in the range: [1M, 3M) [3M, 6M) [6M, 10M[ The on the first iteration we allocate two extent maps for 'split' and 'split2', and use the 'split' to split the first extent map, so after the split we set 'split' to 'split2' and then set 'split2' to NULL. On the second iteration, we don't need to split the second extent map, but because 'split2' is now NULL, we allocate a new extent map for 'split2'. On the third iteration we need to split the third extent map, so we use the extent map pointed by 'split'. So we ended up allocating 3 extent maps for splitting, but all we needed was 2 extent maps. We never need to allocate more than 2, because extent maps that need to be split are always the first one and the last one in the target range. Improve on this by: 1) Using rb_next() to move on to the next extent map. This results in iterating over less nodes of the tree and it does not require comparing the ranges of nodes to our start/end offset; 2) Allocate the 2 extent maps for splitting before entering the loop and never allocate more than 2. In practice it's very rare to have the combination of both extent map allocations fail, since we have a dedicated slab for extent maps, and also have the need to split two extent maps. This patch is part of a patchset comprised of the following patches: btrfs: fix missed extent on fsync after dropping extent maps btrfs: move btrfs_drop_extent_cache() to extent_map.c btrfs: use extent_map_end() at btrfs_drop_extent_map_range() btrfs: use cond_resched_rwlock_write() during inode eviction btrfs: move open coded extent map tree deletion out of inode eviction btrfs: add helper to replace extent map range with a new extent map btrfs: remove the refcount warning/check at free_extent_map() btrfs: remove unnecessary extent map initializations btrfs: assert tree is locked when clearing extent map from logging btrfs: remove unnecessary NULL pointer checks when searching extent maps btrfs: remove unnecessary next extent map search btrfs: avoid pointless extent map tree search when flushing delalloc btrfs: drop extent map range more efficiently And the following fio test was done before and after applying the whole patchset, on a non-debug kernel (Debian's default kernel config) on a 12 cores Intel box with 64G of ram: $ cat test.sh #!/bin/bash DEV=/dev/nvme0n1 MNT=/mnt/nvme0n1 MOUNT_OPTIONS="-o ssd" MKFS_OPTIONS="-R free-space-tree -O no-holes" cat <<EOF > /tmp/fio-job.ini [writers] rw=randwrite fsync=8 fallocate=none group_reporting=1 direct=0 bssplit=4k/20:8k/20:16k/20:32k/10:64k/10:128k/5:256k/5:512k/5:1m/5 ioengine=psync filesize=2G runtime=300 time_based directory=$MNT numjobs=8 thread EOF echo performance | \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor echo echo "Using config:" echo cat /tmp/fio-job.ini echo umount $MNT &> /dev/null mkfs.btrfs -f $MKFS_OPTIONS $DEV mount $MOUNT_OPTIONS $DEV $MNT fio /tmp/fio-job.ini umount $MNT Result before applying the patchset: WRITE: bw=197MiB/s (206MB/s), 197MiB/s-197MiB/s (206MB/s-206MB/s), io=57.7GiB (61.9GB), run=300188-300188msec Result after applying the patchset: WRITE: bw=203MiB/s (213MB/s), 203MiB/s-203MiB/s (213MB/s-213MB/s), io=59.5GiB (63.9GB), run=300019-300019msec Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent b54bb86 commit db21370

File tree

1 file changed

+74
-45
lines changed

1 file changed

+74
-45
lines changed

fs/btrfs/extent_map.c

Lines changed: 74 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -702,11 +702,11 @@ static void drop_all_extent_maps_fast(struct extent_map_tree *tree)
702702
void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
703703
bool skip_pinned)
704704
{
705-
struct extent_map *split = NULL;
706-
struct extent_map *split2 = NULL;
705+
struct extent_map *split;
706+
struct extent_map *split2;
707+
struct extent_map *em;
707708
struct extent_map_tree *em_tree = &inode->extent_tree;
708709
u64 len = end - start + 1;
709-
bool testend = true;
710710

711711
WARN_ON(end < start);
712712
if (end == (u64)-1) {
@@ -715,57 +715,73 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
715715
return;
716716
}
717717
len = (u64)-1;
718-
testend = false;
718+
} else {
719+
/* Make end offset exclusive for use in the loop below. */
720+
end++;
719721
}
720-
while (1) {
721-
struct extent_map *em;
722-
u64 em_end;
722+
723+
/*
724+
* It's ok if we fail to allocate the extent maps, see the comment near
725+
* the bottom of the loop below. We only need two spare extent maps in
726+
* the worst case, where the first extent map that intersects our range
727+
* starts before the range and the last extent map that intersects our
728+
* range ends after our range (and they might be the same extent map),
729+
* because we need to split those two extent maps at the boundaries.
730+
*/
731+
split = alloc_extent_map();
732+
split2 = alloc_extent_map();
733+
734+
write_lock(&em_tree->lock);
735+
em = lookup_extent_mapping(em_tree, start, len);
736+
737+
while (em) {
738+
/* extent_map_end() returns exclusive value (last byte + 1). */
739+
const u64 em_end = extent_map_end(em);
740+
struct extent_map *next_em = NULL;
723741
u64 gen;
724742
unsigned long flags;
725-
bool ends_after_range = false;
726-
bool no_splits = false;
727743
bool modified;
728744
bool compressed;
729745

730-
if (!split)
731-
split = alloc_extent_map();
732-
if (!split2)
733-
split2 = alloc_extent_map();
734-
if (!split || !split2)
735-
no_splits = true;
736-
737-
write_lock(&em_tree->lock);
738-
em = lookup_extent_mapping(em_tree, start, len);
739-
if (!em) {
740-
write_unlock(&em_tree->lock);
741-
break;
746+
if (em_end < end) {
747+
next_em = next_extent_map(em);
748+
if (next_em) {
749+
if (next_em->start < end)
750+
refcount_inc(&next_em->refs);
751+
else
752+
next_em = NULL;
753+
}
742754
}
743-
em_end = extent_map_end(em);
744-
if (testend && em_end > start + len)
745-
ends_after_range = true;
755+
746756
if (skip_pinned && test_bit(EXTENT_FLAG_PINNED, &em->flags)) {
747-
if (ends_after_range) {
748-
free_extent_map(em);
749-
write_unlock(&em_tree->lock);
750-
break;
751-
}
752757
start = em_end;
753-
if (testend)
758+
if (end != (u64)-1)
754759
len = start + len - em_end;
755-
free_extent_map(em);
756-
write_unlock(&em_tree->lock);
757-
continue;
760+
goto next;
758761
}
759-
flags = em->flags;
760-
gen = em->generation;
761-
compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
762+
762763
clear_bit(EXTENT_FLAG_PINNED, &em->flags);
763764
clear_bit(EXTENT_FLAG_LOGGING, &flags);
764765
modified = !list_empty(&em->list);
765-
if (no_splits)
766-
goto next;
766+
767+
/*
768+
* The extent map does not cross our target range, so no need to
769+
* split it, we can remove it directly.
770+
*/
771+
if (em->start >= start && em_end <= end)
772+
goto remove_em;
773+
774+
flags = em->flags;
775+
gen = em->generation;
776+
compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
767777

768778
if (em->start < start) {
779+
if (!split) {
780+
split = split2;
781+
split2 = NULL;
782+
if (!split)
783+
goto remove_em;
784+
}
769785
split->start = em->start;
770786
split->len = start - em->start;
771787

@@ -796,7 +812,13 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
796812
split = split2;
797813
split2 = NULL;
798814
}
799-
if (ends_after_range) {
815+
if (em_end > end) {
816+
if (!split) {
817+
split = split2;
818+
split2 = NULL;
819+
if (!split)
820+
goto remove_em;
821+
}
800822
split->start = start + len;
801823
split->len = em_end - (start + len);
802824
split->block_start = em->block_start;
@@ -842,7 +864,7 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
842864
free_extent_map(split);
843865
split = NULL;
844866
}
845-
next:
867+
remove_em:
846868
if (extent_map_in_tree(em)) {
847869
/*
848870
* If the extent map is still in the tree it means that
@@ -864,20 +886,27 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
864886
* full fsync, otherwise a fast fsync will miss this
865887
* extent if it's new and needs to be logged.
866888
*/
867-
if ((em->start < start || ends_after_range) && modified) {
868-
ASSERT(no_splits);
889+
if ((em->start < start || em_end > end) && modified) {
890+
ASSERT(!split);
869891
btrfs_set_inode_full_sync(inode);
870892
}
871893
remove_extent_mapping(em_tree, em);
872894
}
873-
write_unlock(&em_tree->lock);
874895

875-
/* Once for us. */
896+
/*
897+
* Once for the tree reference (we replaced or removed the
898+
* extent map from the tree).
899+
*/
876900
free_extent_map(em);
877-
/* And once for the tree. */
901+
next:
902+
/* Once for us (for our lookup reference). */
878903
free_extent_map(em);
904+
905+
em = next_em;
879906
}
880907

908+
write_unlock(&em_tree->lock);
909+
881910
free_extent_map(split);
882911
free_extent_map(split2);
883912
}

0 commit comments

Comments
 (0)