Skip to content

Commit bb9464e

Browse files
yangerkuntytso
authored andcommitted
ext4: flush s_error_work before journal destroy in ext4_fill_super
The error path in ext4_fill_super forget to flush s_error_work before journal destroy, and it may trigger the follow bug since flush_stashed_error_work can run concurrently with journal destroy without any protection for sbi->s_journal. [32031.740193] EXT4-fs (loop66): get root inode failed [32031.740484] EXT4-fs (loop66): mount failed [32031.759805] ------------[ cut here ]------------ [32031.759807] kernel BUG at fs/jbd2/transaction.c:373! [32031.760075] invalid opcode: 0000 [#1] SMP PTI [32031.760336] CPU: 5 PID: 1029268 Comm: kworker/5:1 Kdump: loaded 4.18.0 [32031.765112] Call Trace: [32031.765375] ? __switch_to_asm+0x35/0x70 [32031.765635] ? __switch_to_asm+0x41/0x70 [32031.765893] ? __switch_to_asm+0x35/0x70 [32031.766148] ? __switch_to_asm+0x41/0x70 [32031.766405] ? _cond_resched+0x15/0x40 [32031.766665] jbd2__journal_start+0xf1/0x1f0 [jbd2] [32031.766934] jbd2_journal_start+0x19/0x20 [jbd2] [32031.767218] flush_stashed_error_work+0x30/0x90 [ext4] [32031.767487] process_one_work+0x195/0x390 [32031.767747] worker_thread+0x30/0x390 [32031.768007] ? process_one_work+0x390/0x390 [32031.768265] kthread+0x10d/0x130 [32031.768521] ? kthread_flush_work_fn+0x10/0x10 [32031.768778] ret_from_fork+0x35/0x40 static int start_this_handle(...) BUG_ON(journal->j_flags & JBD2_UNMOUNT); <---- Trigger this Besides, after we enable fast commit, ext4_fc_replay can add work to s_error_work but return success, so the latter journal destroy in ext4_load_journal can trigger this problem too. Fix this problem with two steps: 1. Call ext4_commit_super directly in ext4_handle_error for the case that called from ext4_fc_replay 2. Since it's hard to pair the init and flush for s_error_work, we'd better add a extras flush_work before journal destroy in ext4_fill_super Besides, this patch will call ext4_commit_super in ext4_handle_error for any nojournal case too. But it seems safe since the reason we call schedule_work was that we should save error info to sb through journal if available. Conversely, for the nojournal case, it seems useless delay commit superblock to s_error_work. Fixes: c92dc85 ("ext4: defer saving error info from atomic context") Fixes: 2d01ddc ("ext4: save error info to sb through journal if available") Cc: [email protected] Signed-off-by: yangerkun <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 75ca6ad commit bb9464e

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

fs/ext4/super.c

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -660,7 +660,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error,
660660
* constraints, it may not be safe to do it right here so we
661661
* defer superblock flushing to a workqueue.
662662
*/
663-
if (continue_fs)
663+
if (continue_fs && journal)
664664
schedule_work(&EXT4_SB(sb)->s_error_work);
665665
else
666666
ext4_commit_super(sb);
@@ -5050,12 +5050,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
50505050
sbi->s_ea_block_cache = NULL;
50515051

50525052
if (sbi->s_journal) {
5053+
/* flush s_error_work before journal destroy. */
5054+
flush_work(&sbi->s_error_work);
50535055
jbd2_journal_destroy(sbi->s_journal);
50545056
sbi->s_journal = NULL;
50555057
}
50565058
failed_mount3a:
50575059
ext4_es_unregister_shrinker(sbi);
50585060
failed_mount3:
5061+
/* flush s_error_work before sbi destroy */
50595062
flush_work(&sbi->s_error_work);
50605063
del_timer_sync(&sbi->s_err_report);
50615064
ext4_stop_mmpd(sbi);

0 commit comments

Comments
 (0)