Skip to content

Commit 8c80fb3

Browse files
brookxu-txtytso
authored andcommitted
ext4: fix a possible ABBA deadlock due to busy PA
We found on older kernel (3.10) that in the scenario of insufficient disk space, system may trigger an ABBA deadlock problem, it seems that this problem still exists in latest kernel, try to fix it here. The main process triggered by this problem is that task A occupies the PA and waits for the jbd2 transaction finish, the jbd2 transaction waits for the completion of task B's IO (plug_list), but task B waits for the release of PA by task A to finish discard, which indirectly forms an ABBA deadlock. The related calltrace is as follows: Task A vfs_write ext4_mb_new_blocks() ext4_mb_mark_diskspace_used() JBD2 jbd2_journal_get_write_access() -> jbd2_journal_commit_transaction() ->schedule() filemap_fdatawait() | | | Task B | | do_unlinkat() | | ext4_evict_inode() | | jbd2_journal_begin_ordered_truncate() | | filemap_fdatawrite_range() | | ext4_mb_new_blocks() | -ext4_mb_discard_group_preallocations() <----- Here, try to cancel ext4_mb_discard_group_preallocations() internal retry due to PA busy, and do a limited number of retries inside ext4_mb_discard_preallocations(), which can circumvent the above problems, but also has some advantages: 1. Since the PA is in a busy state, if other groups have free PAs, keeping the current PA may help to reduce fragmentation. 2. Continue to traverse forward instead of waiting for the current group PA to be released. In most scenarios, the PA discard time can be reduced. However, in the case of smaller free space, if only a few groups have space, then due to multiple traversals of the group, it may increase CPU overhead. But in contrast, I feel that the overall benefit is better than the cost. Signed-off-by: Chunguang Xu <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Cc: [email protected]
1 parent dfac1a1 commit 8c80fb3

File tree

1 file changed

+18
-22
lines changed

1 file changed

+18
-22
lines changed

fs/ext4/mballoc.c

Lines changed: 18 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -4814,16 +4814,15 @@ ext4_mb_release_group_pa(struct ext4_buddy *e4b,
48144814
*/
48154815
static noinline_for_stack int
48164816
ext4_mb_discard_group_preallocations(struct super_block *sb,
4817-
ext4_group_t group, int needed)
4817+
ext4_group_t group, int *busy)
48184818
{
48194819
struct ext4_group_info *grp = ext4_get_group_info(sb, group);
48204820
struct buffer_head *bitmap_bh = NULL;
48214821
struct ext4_prealloc_space *pa, *tmp;
48224822
struct list_head list;
48234823
struct ext4_buddy e4b;
48244824
int err;
4825-
int busy = 0;
4826-
int free, free_total = 0;
4825+
int free = 0;
48274826

48284827
mb_debug(sb, "discard preallocation for group %u\n", group);
48294828
if (list_empty(&grp->bb_prealloc_list))
@@ -4846,19 +4845,14 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
48464845
goto out_dbg;
48474846
}
48484847

4849-
if (needed == 0)
4850-
needed = EXT4_CLUSTERS_PER_GROUP(sb) + 1;
4851-
48524848
INIT_LIST_HEAD(&list);
4853-
repeat:
4854-
free = 0;
48554849
ext4_lock_group(sb, group);
48564850
list_for_each_entry_safe(pa, tmp,
48574851
&grp->bb_prealloc_list, pa_group_list) {
48584852
spin_lock(&pa->pa_lock);
48594853
if (atomic_read(&pa->pa_count)) {
48604854
spin_unlock(&pa->pa_lock);
4861-
busy = 1;
4855+
*busy = 1;
48624856
continue;
48634857
}
48644858
if (pa->pa_deleted) {
@@ -4898,22 +4892,13 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
48984892
call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
48994893
}
49004894

4901-
free_total += free;
4902-
4903-
/* if we still need more blocks and some PAs were used, try again */
4904-
if (free_total < needed && busy) {
4905-
ext4_unlock_group(sb, group);
4906-
cond_resched();
4907-
busy = 0;
4908-
goto repeat;
4909-
}
49104895
ext4_unlock_group(sb, group);
49114896
ext4_mb_unload_buddy(&e4b);
49124897
put_bh(bitmap_bh);
49134898
out_dbg:
49144899
mb_debug(sb, "discarded (%d) blocks preallocated for group %u bb_free (%d)\n",
4915-
free_total, group, grp->bb_free);
4916-
return free_total;
4900+
free, group, grp->bb_free);
4901+
return free;
49174902
}
49184903

49194904
/*
@@ -5455,13 +5440,24 @@ static int ext4_mb_discard_preallocations(struct super_block *sb, int needed)
54555440
{
54565441
ext4_group_t i, ngroups = ext4_get_groups_count(sb);
54575442
int ret;
5458-
int freed = 0;
5443+
int freed = 0, busy = 0;
5444+
int retry = 0;
54595445

54605446
trace_ext4_mb_discard_preallocations(sb, needed);
5447+
5448+
if (needed == 0)
5449+
needed = EXT4_CLUSTERS_PER_GROUP(sb) + 1;
5450+
repeat:
54615451
for (i = 0; i < ngroups && needed > 0; i++) {
5462-
ret = ext4_mb_discard_group_preallocations(sb, i, needed);
5452+
ret = ext4_mb_discard_group_preallocations(sb, i, &busy);
54635453
freed += ret;
54645454
needed -= ret;
5455+
cond_resched();
5456+
}
5457+
5458+
if (needed > 0 && busy && ++retry < 3) {
5459+
busy = 0;
5460+
goto repeat;
54655461
}
54665462

54675463
return freed;

0 commit comments

Comments
 (0)