Skip to content

Commit a6ecc2a

Browse files
committed
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o: "In addition to bug fixes and cleanups, there are two new features for ext4 in 5.14: - Allow applications to poll on changes to /sys/fs/ext4/*/errors_count - Add the ioctl EXT4_IOC_CHECKPOINT which allows the journal to be checkpointed, truncated and discarded or zero'ed" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits) jbd2: export jbd2_journal_[un]register_shrinker() ext4: notify sysfs on errors_count value change fs: remove bdev_try_to_free_page callback ext4: remove bdev_try_to_free_page() callback jbd2: simplify journal_clean_one_cp_list() jbd2,ext4: add a shrinker to release checkpointed buffers jbd2: remove redundant buffer io error checks jbd2: don't abort the journal when freeing buffers jbd2: ensure abort the journal if detect IO error when writing original buffer back jbd2: remove the out label in __jbd2_journal_remove_checkpoint() ext4: no need to verify new add extent block jbd2: clean up misleading comments for jbd2_fc_release_bufs ext4: add check to prevent attempting to resize an fs with sparse_super2 ext4: consolidate checks for resize of bigalloc into ext4_resize_begin ext4: remove duplicate definition of ext4_xattr_ibody_inline_set() ext4: fsmap: fix the block/inode bitmap comment ext4: fix comment for s_hash_unsigned ext4: use local variable ei instead of EXT4_I() macro ext4: fix avefreec in find_group_orlov ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit ...
2 parents 2cfa582 + 16aa4c9 commit a6ecc2a

File tree

25 files changed

+720
-215
lines changed

25 files changed

+720
-215
lines changed

Documentation/filesystems/ext4/journal.rst

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,14 @@ Journal (jbd2)
44
--------------
55

66
Introduced in ext3, the ext4 filesystem employs a journal to protect the
7-
filesystem against corruption in the case of a system crash. A small
8-
continuous region of disk (default 128MiB) is reserved inside the
9-
filesystem as a place to land “important” data writes on-disk as quickly
10-
as possible. Once the important data transaction is fully written to the
11-
disk and flushed from the disk write cache, a record of the data being
12-
committed is also written to the journal. At some later point in time,
13-
the journal code writes the transactions to their final locations on
14-
disk (this could involve a lot of seeking or a lot of small
7+
filesystem against metadata inconsistencies in the case of a system crash. Up
8+
to 10,240,000 file system blocks (see man mke2fs(8) for more details on journal
9+
size limits) can be reserved inside the filesystem as a place to land
10+
important data writes on-disk as quickly as possible. Once the important
11+
data transaction is fully written to the disk and flushed from the disk write
12+
cache, a record of the data being committed is also written to the journal. At
13+
some later point in time, the journal code writes the transactions to their
14+
final locations on disk (this could involve a lot of seeking or a lot of small
1515
read-write-erases) before erasing the commit record. Should the system
1616
crash during the second slow write, the journal can be replayed all the
1717
way to the latest commit record, guaranteeing the atomicity of whatever
@@ -731,3 +731,26 @@ point, the refcount for inode 11 is not reliable, but that gets fixed by the
731731
replay of last inode 11 tag. Thus, by converting a non-idempotent procedure
732732
into a series of idempotent outcomes, fast commits ensured idempotence during
733733
the replay.
734+
735+
Journal Checkpoint
736+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
737+
738+
Checkpointing the journal ensures all transactions and their associated buffers
739+
are submitted to the disk. In-progress transactions are waited upon and included
740+
in the checkpoint. Checkpointing is used internally during critical updates to
741+
the filesystem including journal recovery, filesystem resizing, and freeing of
742+
the journal_t structure.
743+
744+
A journal checkpoint can be triggered from userspace via the ioctl
745+
EXT4_IOC_CHECKPOINT. This ioctl takes a single, u64 argument for flags.
746+
Currently, three flags are supported. First, EXT4_IOC_CHECKPOINT_FLAG_DRY_RUN
747+
can be used to verify input to the ioctl. It returns error if there is any
748+
invalid input, otherwise it returns success without performing
749+
any checkpointing. This can be used to check whether the ioctl exists on a
750+
system and to verify there are no issues with arguments or flags. The
751+
other two flags are EXT4_IOC_CHECKPOINT_FLAG_DISCARD and
752+
EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT. These flags cause the journal blocks to be
753+
discarded or zero-filled, respectively, after the journal checkpoint is
754+
complete. EXT4_IOC_CHECKPOINT_FLAG_DISCARD and EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT
755+
cannot both be set. The ioctl may be useful when snapshotting a system or for
756+
complying with content deletion SLOs.

fs/block_dev.c

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1673,20 +1673,6 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
16731673
}
16741674
EXPORT_SYMBOL_GPL(blkdev_read_iter);
16751675

1676-
/*
1677-
* Try to release a page associated with block device when the system
1678-
* is under memory pressure.
1679-
*/
1680-
static int blkdev_releasepage(struct page *page, gfp_t wait)
1681-
{
1682-
struct super_block *super = BDEV_I(page->mapping->host)->bdev.bd_super;
1683-
1684-
if (super && super->s_op->bdev_try_to_free_page)
1685-
return super->s_op->bdev_try_to_free_page(super, page, wait);
1686-
1687-
return try_to_free_buffers(page);
1688-
}
1689-
16901676
static int blkdev_writepages(struct address_space *mapping,
16911677
struct writeback_control *wbc)
16921678
{
@@ -1701,7 +1687,6 @@ static const struct address_space_operations def_blk_aops = {
17011687
.write_begin = blkdev_write_begin,
17021688
.write_end = blkdev_write_end,
17031689
.writepages = blkdev_writepages,
1704-
.releasepage = blkdev_releasepage,
17051690
.direct_IO = blkdev_direct_IO,
17061691
.migratepage = buffer_migrate_page_norefs,
17071692
.is_dirty_writeback = buffer_check_dirty_writeback,

fs/ext4/ext4.h

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -720,6 +720,7 @@ enum {
720720
#define EXT4_IOC_CLEAR_ES_CACHE _IO('f', 40)
721721
#define EXT4_IOC_GETSTATE _IOW('f', 41, __u32)
722722
#define EXT4_IOC_GET_ES_CACHE _IOWR('f', 42, struct fiemap)
723+
#define EXT4_IOC_CHECKPOINT _IOW('f', 43, __u32)
723724

724725
#define EXT4_IOC_SHUTDOWN _IOR ('X', 125, __u32)
725726

@@ -741,6 +742,14 @@ enum {
741742
#define EXT4_STATE_FLAG_NEWENTRY 0x00000004
742743
#define EXT4_STATE_FLAG_DA_ALLOC_CLOSE 0x00000008
743744

745+
/* flags for ioctl EXT4_IOC_CHECKPOINT */
746+
#define EXT4_IOC_CHECKPOINT_FLAG_DISCARD 0x1
747+
#define EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT 0x2
748+
#define EXT4_IOC_CHECKPOINT_FLAG_DRY_RUN 0x4
749+
#define EXT4_IOC_CHECKPOINT_FLAG_VALID (EXT4_IOC_CHECKPOINT_FLAG_DISCARD | \
750+
EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT | \
751+
EXT4_IOC_CHECKPOINT_FLAG_DRY_RUN)
752+
744753
#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
745754
/*
746755
* ioctl commands in 32 bit emulation
@@ -1477,7 +1486,7 @@ struct ext4_sb_info {
14771486
unsigned int s_inode_goal;
14781487
u32 s_hash_seed[4];
14791488
int s_def_hash_version;
1480-
int s_hash_unsigned; /* 3 if hash should be signed, 0 if not */
1489+
int s_hash_unsigned; /* 3 if hash should be unsigned, 0 if not */
14811490
struct percpu_counter s_freeclusters_counter;
14821491
struct percpu_counter s_freeinodes_counter;
14831492
struct percpu_counter s_dirs_counter;
@@ -1488,6 +1497,7 @@ struct ext4_sb_info {
14881497
struct kobject s_kobj;
14891498
struct completion s_kobj_unregister;
14901499
struct super_block *s_sb;
1500+
struct buffer_head *s_mmp_bh;
14911501

14921502
/* Journaling */
14931503
struct journal_s *s_journal;
@@ -3614,6 +3624,7 @@ extern const struct inode_operations ext4_symlink_inode_operations;
36143624
extern const struct inode_operations ext4_fast_symlink_inode_operations;
36153625

36163626
/* sysfs.c */
3627+
extern void ext4_notify_error_sysfs(struct ext4_sb_info *sbi);
36173628
extern int ext4_register_sysfs(struct super_block *sb);
36183629
extern void ext4_unregister_sysfs(struct super_block *sb);
36193630
extern int __init ext4_init_sysfs(void);
@@ -3720,6 +3731,9 @@ extern struct ext4_io_end_vec *ext4_last_io_end_vec(ext4_io_end_t *io_end);
37203731
/* mmp.c */
37213732
extern int ext4_multi_mount_protect(struct super_block *, ext4_fsblk_t);
37223733

3734+
/* mmp.c */
3735+
extern void ext4_stop_mmpd(struct ext4_sb_info *sbi);
3736+
37233737
/* verity.c */
37243738
extern const struct fsverity_operations ext4_verityops;
37253739

@@ -3784,7 +3798,7 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh)
37843798
* have to read the block because we may read the old data
37853799
* successfully.
37863800
*/
3787-
if (!buffer_uptodate(bh) && buffer_write_io_error(bh))
3801+
if (buffer_write_io_error(bh))
37883802
set_buffer_uptodate(bh);
37893803
return buffer_uptodate(bh);
37903804
}

fs/ext4/extents.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -825,6 +825,7 @@ void ext4_ext_tree_init(handle_t *handle, struct inode *inode)
825825
eh->eh_entries = 0;
826826
eh->eh_magic = EXT4_EXT_MAGIC;
827827
eh->eh_max = cpu_to_le16(ext4_ext_space_root(inode, 0));
828+
eh->eh_generation = 0;
828829
ext4_mark_inode_dirty(handle, inode);
829830
}
830831

@@ -1090,6 +1091,7 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
10901091
neh->eh_max = cpu_to_le16(ext4_ext_space_block(inode, 0));
10911092
neh->eh_magic = EXT4_EXT_MAGIC;
10921093
neh->eh_depth = 0;
1094+
neh->eh_generation = 0;
10931095

10941096
/* move remainder of path[depth] to the new leaf */
10951097
if (unlikely(path[depth].p_hdr->eh_entries !=
@@ -1167,6 +1169,7 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
11671169
neh->eh_magic = EXT4_EXT_MAGIC;
11681170
neh->eh_max = cpu_to_le16(ext4_ext_space_block_idx(inode, 0));
11691171
neh->eh_depth = cpu_to_le16(depth - i);
1172+
neh->eh_generation = 0;
11701173
fidx = EXT_FIRST_INDEX(neh);
11711174
fidx->ei_block = border;
11721175
ext4_idx_store_pblock(fidx, oldblock);
@@ -1306,6 +1309,7 @@ static int ext4_ext_grow_indepth(handle_t *handle, struct inode *inode,
13061309
neh->eh_magic = EXT4_EXT_MAGIC;
13071310
ext4_extent_block_csum_set(inode, neh);
13081311
set_buffer_uptodate(bh);
1312+
set_buffer_verified(bh);
13091313
unlock_buffer(bh);
13101314

13111315
err = ext4_handle_dirty_metadata(handle, inode, bh);

fs/ext4/extents_status.c

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1574,11 +1574,9 @@ static unsigned long ext4_es_scan(struct shrinker *shrink,
15741574
ret = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_shk_cnt);
15751575
trace_ext4_es_shrink_scan_enter(sbi->s_sb, nr_to_scan, ret);
15761576

1577-
if (!nr_to_scan)
1578-
return ret;
1579-
15801577
nr_shrunk = __es_shrink(sbi, nr_to_scan, NULL);
15811578

1579+
ret = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_shk_cnt);
15821580
trace_ext4_es_shrink_scan_exit(sbi->s_sb, nr_shrunk, ret);
15831581
return nr_shrunk;
15841582
}

fs/ext4/fsmap.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
5050
#define EXT4_FMR_OWN_INODES FMR_OWNER('X', 5) /* inodes */
5151
#define EXT4_FMR_OWN_GDT FMR_OWNER('f', 1) /* group descriptors */
5252
#define EXT4_FMR_OWN_RESV_GDT FMR_OWNER('f', 2) /* reserved gdt blocks */
53-
#define EXT4_FMR_OWN_BLKBM FMR_OWNER('f', 3) /* inode bitmap */
54-
#define EXT4_FMR_OWN_INOBM FMR_OWNER('f', 4) /* block bitmap */
53+
#define EXT4_FMR_OWN_BLKBM FMR_OWNER('f', 3) /* block bitmap */
54+
#define EXT4_FMR_OWN_INOBM FMR_OWNER('f', 4) /* inode bitmap */
5555

5656
#endif /* __EXT4_FSMAP_H__ */

fs/ext4/ialloc.c

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -402,7 +402,7 @@ static void get_orlov_stats(struct super_block *sb, ext4_group_t g,
402402
*
403403
* We always try to spread first-level directories.
404404
*
405-
* If there are blockgroups with both free inodes and free blocks counts
405+
* If there are blockgroups with both free inodes and free clusters counts
406406
* not worse than average we return one with smallest directory count.
407407
* Otherwise we simply return a random group.
408408
*
@@ -411,7 +411,7 @@ static void get_orlov_stats(struct super_block *sb, ext4_group_t g,
411411
* It's OK to put directory into a group unless
412412
* it has too many directories already (max_dirs) or
413413
* it has too few free inodes left (min_inodes) or
414-
* it has too few free blocks left (min_blocks) or
414+
* it has too few free clusters left (min_clusters) or
415415
* Parent's group is preferred, if it doesn't satisfy these
416416
* conditions we search cyclically through the rest. If none
417417
* of the groups look good we just look for a group with more
@@ -427,7 +427,7 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
427427
ext4_group_t real_ngroups = ext4_get_groups_count(sb);
428428
int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
429429
unsigned int freei, avefreei, grp_free;
430-
ext4_fsblk_t freeb, avefreec;
430+
ext4_fsblk_t freec, avefreec;
431431
unsigned int ndirs;
432432
int max_dirs, min_inodes;
433433
ext4_grpblk_t min_clusters;
@@ -446,9 +446,8 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
446446

447447
freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
448448
avefreei = freei / ngroups;
449-
freeb = EXT4_C2B(sbi,
450-
percpu_counter_read_positive(&sbi->s_freeclusters_counter));
451-
avefreec = freeb;
449+
freec = percpu_counter_read_positive(&sbi->s_freeclusters_counter);
450+
avefreec = freec;
452451
do_div(avefreec, ngroups);
453452
ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
454453

fs/ext4/inline.c

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ static int ext4_read_inline_data(struct inode *inode, void *buffer,
204204
/*
205205
* write the buffer to the inline inode.
206206
* If 'create' is set, we don't need to do the extra copy in the xattr
207-
* value since it is already handled by ext4_xattr_ibody_inline_set.
207+
* value since it is already handled by ext4_xattr_ibody_set.
208208
* That saves us one memcpy.
209209
*/
210210
static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc,
@@ -286,7 +286,7 @@ static int ext4_create_inline_data(handle_t *handle,
286286

287287
BUG_ON(!is.s.not_found);
288288

289-
error = ext4_xattr_ibody_inline_set(handle, inode, &i, &is);
289+
error = ext4_xattr_ibody_set(handle, inode, &i, &is);
290290
if (error) {
291291
if (error == -ENOSPC)
292292
ext4_clear_inode_state(inode,
@@ -358,7 +358,7 @@ static int ext4_update_inline_data(handle_t *handle, struct inode *inode,
358358
i.value = value;
359359
i.value_len = len;
360360

361-
error = ext4_xattr_ibody_inline_set(handle, inode, &i, &is);
361+
error = ext4_xattr_ibody_set(handle, inode, &i, &is);
362362
if (error)
363363
goto out;
364364

@@ -431,7 +431,7 @@ static int ext4_destroy_inline_data_nolock(handle_t *handle,
431431
if (error)
432432
goto out;
433433

434-
error = ext4_xattr_ibody_inline_set(handle, inode, &i, &is);
434+
error = ext4_xattr_ibody_set(handle, inode, &i, &is);
435435
if (error)
436436
goto out;
437437

@@ -1925,8 +1925,7 @@ int ext4_inline_data_truncate(struct inode *inode, int *has_inline)
19251925
i.value = value;
19261926
i.value_len = i_size > EXT4_MIN_INLINE_DATA_SIZE ?
19271927
i_size - EXT4_MIN_INLINE_DATA_SIZE : 0;
1928-
err = ext4_xattr_ibody_inline_set(handle, inode,
1929-
&i, &is);
1928+
err = ext4_xattr_ibody_set(handle, inode, &i, &is);
19301929
if (err)
19311930
goto out_error;
19321931
}

fs/ext4/inode.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -374,7 +374,7 @@ void ext4_da_update_reserve_space(struct inode *inode,
374374
ei->i_reserved_data_blocks -= used;
375375
percpu_counter_sub(&sbi->s_dirtyclusters_counter, used);
376376

377-
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
377+
spin_unlock(&ei->i_block_reservation_lock);
378378

379379
/* Update quota subsystem for data blocks */
380380
if (quota_claim)
@@ -3223,7 +3223,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
32233223
ext4_clear_inode_state(inode, EXT4_STATE_JDATA);
32243224
journal = EXT4_JOURNAL(inode);
32253225
jbd2_journal_lock_updates(journal);
3226-
err = jbd2_journal_flush(journal);
3226+
err = jbd2_journal_flush(journal, 0);
32273227
jbd2_journal_unlock_updates(journal);
32283228

32293229
if (err)
@@ -3418,7 +3418,7 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
34183418
* i_disksize out to i_size. This could be beyond where direct I/O is
34193419
* happening and thus expose allocated blocks to direct I/O reads.
34203420
*/
3421-
else if ((map->m_lblk * (1 << blkbits)) >= i_size_read(inode))
3421+
else if (((loff_t)map->m_lblk << blkbits) >= i_size_read(inode))
34223422
m_flags = EXT4_GET_BLOCKS_CREATE;
34233423
else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
34243424
m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT;
@@ -6005,7 +6005,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
60056005
if (val)
60066006
ext4_set_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
60076007
else {
6008-
err = jbd2_journal_flush(journal);
6008+
err = jbd2_journal_flush(journal, 0);
60096009
if (err < 0) {
60106010
jbd2_journal_unlock_updates(journal);
60116011
percpu_up_write(&sbi->s_writepages_rwsem);

0 commit comments

Comments
 (0)