Skip to content

Commit c416a30

Browse files
josefbacikkdave
authored andcommitted
btrfs: rip out may_commit_transaction
may_commit_transaction was introduced before the ticketing infrastructure existed. There was a problem where we'd legitimately be out of space, but every reservation would trigger a transaction commit and then fail. Thus if you had 1000 things trying to make a reservation, they'd all do the flushing loop and thus commit the transaction 1000 times before they'd get their ENOSPC. This helper was introduced to short circuit this, if there wasn't space that could be reclaimed by committing the transaction then simply ENOSPC out. This made true ENOSPC tests much faster as we didn't waste a bunch of time. However many of our bugs over the years have been from cases where we didn't account for some space that would be reclaimed by committing a transaction. The delayed refs rsv space, delayed rsv, many pinned bytes miscalculations, etc. And in the meantime the original problem has been solved with ticketing. We no longer will commit the transaction 1000 times. Instead we'll get 1000 waiters, we will go through the flushing mechanisms, and if there's no progress after 2 loops we ENOSPC everybody out. The ticketing infrastructure gives us a deterministic way to see if we're making progress or not, thus we avoid a lot of extra work. So simplify this step by simply unconditionally committing the transaction. This removes what is arguably our most common source of early ENOSPC bugs and will allow us to drastically simplify many of the things we track because we simply won't need them with this stuff gone. Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 35b22c1 commit c416a30

File tree

3 files changed

+12
-128
lines changed

3 files changed

+12
-128
lines changed

fs/btrfs/ctree.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2787,7 +2787,6 @@ enum btrfs_flush_state {
27872787
ALLOC_CHUNK_FORCE = 8,
27882788
RUN_DELAYED_IPUTS = 9,
27892789
COMMIT_TRANS = 10,
2790-
FORCE_COMMIT_TRANS = 11,
27912790
};
27922791

27932792
int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,

fs/btrfs/space-info.c

Lines changed: 11 additions & 125 deletions
Original file line numberDiff line numberDiff line change
@@ -133,18 +133,13 @@
133133
* operations, however they won't be usable until the transaction commits.
134134
*
135135
* COMMIT_TRANS
136-
* may_commit_transaction() is the ultimate arbiter on whether we commit the
137-
* transaction or not. In order to avoid constantly churning we do all the
138-
* above flushing first and then commit the transaction as the last resort.
139-
* However we need to take into account things like pinned space that would
140-
* be freed, plus any delayed work we may not have gotten rid of in the case
141-
* of metadata.
142-
*
143-
* FORCE_COMMIT_TRANS
144-
* For use by the preemptive flusher. We use this to bypass the ticketing
145-
* checks in may_commit_transaction, as we have more information about the
146-
* overall state of the system and may want to commit the transaction ahead
147-
* of actual ENOSPC conditions.
136+
* This will commit the transaction. Historically we had a lot of logic
137+
* surrounding whether or not we'd commit the transaction, but this waits born
138+
* out of a pre-tickets era where we could end up committing the transaction
139+
* thousands of times in a row without making progress. Now thanks to our
140+
* ticketing system we know if we're not making progress and can error
141+
* everybody out after a few commits rather than burning the disk hoping for
142+
* a different answer.
148143
*
149144
* OVERCOMMIT
150145
*
@@ -575,109 +570,6 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
575570
}
576571
}
577572

578-
/**
579-
* Possibly commit the transaction if its ok to
580-
*
581-
* @fs_info: the filesystem
582-
* @space_info: space_info we are checking for commit, either data or metadata
583-
*
584-
* This will check to make sure that committing the transaction will actually
585-
* get us somewhere and then commit the transaction if it does. Otherwise it
586-
* will return -ENOSPC.
587-
*/
588-
static int may_commit_transaction(struct btrfs_fs_info *fs_info,
589-
struct btrfs_space_info *space_info)
590-
{
591-
struct reserve_ticket *ticket = NULL;
592-
struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_block_rsv;
593-
struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
594-
struct btrfs_block_rsv *trans_rsv = &fs_info->trans_block_rsv;
595-
struct btrfs_trans_handle *trans;
596-
u64 reclaim_bytes = 0;
597-
u64 bytes_needed = 0;
598-
u64 cur_free_bytes = 0;
599-
600-
trans = (struct btrfs_trans_handle *)current->journal_info;
601-
if (trans)
602-
return -EAGAIN;
603-
604-
spin_lock(&space_info->lock);
605-
cur_free_bytes = btrfs_space_info_used(space_info, true);
606-
if (cur_free_bytes < space_info->total_bytes)
607-
cur_free_bytes = space_info->total_bytes - cur_free_bytes;
608-
else
609-
cur_free_bytes = 0;
610-
611-
if (!list_empty(&space_info->priority_tickets))
612-
ticket = list_first_entry(&space_info->priority_tickets,
613-
struct reserve_ticket, list);
614-
else if (!list_empty(&space_info->tickets))
615-
ticket = list_first_entry(&space_info->tickets,
616-
struct reserve_ticket, list);
617-
if (ticket)
618-
bytes_needed = ticket->bytes;
619-
620-
if (bytes_needed > cur_free_bytes)
621-
bytes_needed -= cur_free_bytes;
622-
else
623-
bytes_needed = 0;
624-
spin_unlock(&space_info->lock);
625-
626-
if (!bytes_needed)
627-
return 0;
628-
629-
trans = btrfs_join_transaction(fs_info->extent_root);
630-
if (IS_ERR(trans))
631-
return PTR_ERR(trans);
632-
633-
/*
634-
* See if there is enough pinned space to make this reservation, or if
635-
* we have block groups that are going to be freed, allowing us to
636-
* possibly do a chunk allocation the next loop through.
637-
*/
638-
if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags) ||
639-
__percpu_counter_compare(&space_info->total_bytes_pinned,
640-
bytes_needed,
641-
BTRFS_TOTAL_BYTES_PINNED_BATCH) >= 0)
642-
goto commit;
643-
644-
/*
645-
* See if there is some space in the delayed insertion reserve for this
646-
* reservation. If the space_info's don't match (like for DATA or
647-
* SYSTEM) then just go enospc, reclaiming this space won't recover any
648-
* space to satisfy those reservations.
649-
*/
650-
if (space_info != delayed_rsv->space_info)
651-
goto enospc;
652-
653-
spin_lock(&delayed_rsv->lock);
654-
reclaim_bytes += delayed_rsv->reserved;
655-
spin_unlock(&delayed_rsv->lock);
656-
657-
spin_lock(&delayed_refs_rsv->lock);
658-
reclaim_bytes += delayed_refs_rsv->reserved;
659-
spin_unlock(&delayed_refs_rsv->lock);
660-
661-
spin_lock(&trans_rsv->lock);
662-
reclaim_bytes += trans_rsv->reserved;
663-
spin_unlock(&trans_rsv->lock);
664-
665-
if (reclaim_bytes >= bytes_needed)
666-
goto commit;
667-
bytes_needed -= reclaim_bytes;
668-
669-
if (__percpu_counter_compare(&space_info->total_bytes_pinned,
670-
bytes_needed,
671-
BTRFS_TOTAL_BYTES_PINNED_BATCH) < 0)
672-
goto enospc;
673-
674-
commit:
675-
return btrfs_commit_transaction(trans);
676-
enospc:
677-
btrfs_end_transaction(trans);
678-
return -ENOSPC;
679-
}
680-
681573
/*
682574
* Try to flush some data based on policy set by @state. This is only advisory
683575
* and may fail for various reasons. The caller is supposed to examine the
@@ -752,9 +644,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
752644
btrfs_wait_on_delayed_iputs(fs_info);
753645
break;
754646
case COMMIT_TRANS:
755-
ret = may_commit_transaction(fs_info, space_info);
756-
break;
757-
case FORCE_COMMIT_TRANS:
647+
ASSERT(current->journal_info == NULL);
758648
trans = btrfs_join_transaction(root);
759649
if (IS_ERR(trans)) {
760650
ret = PTR_ERR(trans);
@@ -1136,7 +1026,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
11361026
(delayed_block_rsv->reserved +
11371027
delayed_refs_rsv->reserved)) {
11381028
to_reclaim = space_info->bytes_pinned;
1139-
flush = FORCE_COMMIT_TRANS;
1029+
flush = COMMIT_TRANS;
11401030
} else if (delayed_block_rsv->reserved >
11411031
delayed_refs_rsv->reserved) {
11421032
to_reclaim = delayed_block_rsv->reserved;
@@ -1206,12 +1096,8 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
12061096
* the information it needs to make the right decision.
12071097
*
12081098
* COMMIT_TRANS
1209-
* This is where we reclaim all of the pinned space generated by the previous
1210-
* two stages. We will not commit the transaction if we don't think we're
1211-
* likely to satisfy our request, which means if our current free space +
1212-
* total_bytes_pinned < reservation we will not commit. This is why the
1213-
* previous states are actually important, to make sure we know for sure
1214-
* whether committing the transaction will allow us to make progress.
1099+
* This is where we reclaim all of the pinned space generated by running the
1100+
* iputs
12151101
*
12161102
* ALLOC_CHUNK_FORCE
12171103
* For data we start with alloc chunk force, however we could have been full

include/trace/events/btrfs.h

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,7 @@ struct btrfs_space_info;
9999
EM( ALLOC_CHUNK, "ALLOC_CHUNK") \
100100
EM( ALLOC_CHUNK_FORCE, "ALLOC_CHUNK_FORCE") \
101101
EM( RUN_DELAYED_IPUTS, "RUN_DELAYED_IPUTS") \
102-
EM( COMMIT_TRANS, "COMMIT_TRANS") \
103-
EMe(FORCE_COMMIT_TRANS, "FORCE_COMMIT_TRANS")
102+
EMe(COMMIT_TRANS, "COMMIT_TRANS")
104103

105104
/*
106105
* First define the enums in the above macros to be exported to userspace via

0 commit comments

Comments
 (0)