Skip to content

Commit ef51068

Browse files
committed
Merge tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim: "In this cycle, f2fs has some performance improvements for Android workloads such as using read-unfair rwsems and adding some sysfs entries to control GCs and discard commands in more details. In addtiion, it has some tunings to improve the recovery speed after sudden power-cut. Enhancement: - add reader-unfair rwsems with F2FS_UNFAIR_RWSEM: will replace with generic API support - adjust to make the readahead/recovery flow more efficiently - sysfs entries to control issue speeds of GCs and Discard commands - enable idmapped mounts Bug fix: - correct wrong error handling routines - fix missing conditions in quota - fix a potential deadlock between writeback and block plug routines - fix a deadlock btween freezefs and evict_inode We've added some boundary checks to avoid kernel panics on corrupted images, and several minor code clean-ups" * tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (27 commits) f2fs: fix to do sanity check on .cp_pack_total_block_count f2fs: make gc_urgent and gc_segment_mode sysfs node readable f2fs: use aggressive GC policy during f2fs_disable_checkpoint() f2fs: fix compressed file start atomic write may cause data corruption f2fs: initialize sbi->gc_mode explicitly f2fs: introduce gc_urgent_mid mode f2fs: compress: fix to print raw data size in error path of lz4 decompression f2fs: remove redundant parameter judgment f2fs: use spin_lock to avoid hang f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs f2fs: remove unnecessary read for F2FS_FITS_IN_INODE f2fs: introduce F2FS_UNFAIR_RWSEM to support unfair rwsem f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes f2fs: fix to do sanity check on curseg->alloc_type f2fs: fix to avoid potential deadlock f2fs: quota: fix loop condition at f2fs_quota_sync() f2fs: Restore rwsem lockdep support f2fs: fix missing free nid in f2fs_handle_failed_inode f2fs: support idmapped mounts f2fs: add a way to limit roll forward recovery time ...
2 parents aab4ed5 + 5b5b4f8 commit ef51068

File tree

23 files changed

+699
-391
lines changed

23 files changed

+699
-391
lines changed

Documentation/ABI/testing/sysfs-fs-f2fs

Lines changed: 47 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,9 @@ Description: Controls the in-place-update policy.
5555
0x04 F2FS_IPU_UTIL
5656
0x08 F2FS_IPU_SSR_UTIL
5757
0x10 F2FS_IPU_FSYNC
58-
0x20 F2FS_IPU_ASYNC,
58+
0x20 F2FS_IPU_ASYNC
5959
0x40 F2FS_IPU_NOCACHE
60+
0x80 F2FS_IPU_HONOR_OPU_WRITE
6061
==== =================
6162

6263
Refer segment.h for details.
@@ -98,6 +99,33 @@ Description: Controls the issue rate of discard commands that consist of small
9899
checkpoint is triggered, and issued during the checkpoint.
99100
By default, it is disabled with 0.
100101

102+
What: /sys/fs/f2fs/<disk>/max_discard_request
103+
Date: December 2021
104+
Contact: "Konstantin Vyshetsky" <[email protected]>
105+
Description: Controls the number of discards a thread will issue at a time.
106+
Higher number will allow the discard thread to finish its work
107+
faster, at the cost of higher latency for incomming I/O.
108+
109+
What: /sys/fs/f2fs/<disk>/min_discard_issue_time
110+
Date: December 2021
111+
Contact: "Konstantin Vyshetsky" <[email protected]>
112+
Description: Controls the interval the discard thread will wait between
113+
issuing discard requests when there are discards to be issued and
114+
no I/O aware interruptions occur.
115+
116+
What: /sys/fs/f2fs/<disk>/mid_discard_issue_time
117+
Date: December 2021
118+
Contact: "Konstantin Vyshetsky" <[email protected]>
119+
Description: Controls the interval the discard thread will wait between
120+
issuing discard requests when there are discards to be issued and
121+
an I/O aware interruption occurs.
122+
123+
What: /sys/fs/f2fs/<disk>/max_discard_issue_time
124+
Date: December 2021
125+
Contact: "Konstantin Vyshetsky" <[email protected]>
126+
Description: Controls the interval the discard thread will wait when there are
127+
no discard operations to be issued.
128+
101129
What: /sys/fs/f2fs/<disk>/discard_granularity
102130
Date: July 2017
103131
Contact: "Chao Yu" <[email protected]>
@@ -269,11 +297,16 @@ Description: Shows current reserved blocks in system, it may be temporarily
269297
What: /sys/fs/f2fs/<disk>/gc_urgent
270298
Date: August 2017
271299
Contact: "Jaegeuk Kim" <[email protected]>
272-
Description: Do background GC aggressively when set. When gc_urgent = 1,
273-
background thread starts to do GC by given gc_urgent_sleep_time
274-
interval. When gc_urgent = 2, F2FS will lower the bar of
275-
checking idle in order to process outstanding discard commands
276-
and GC a little bit aggressively. It is set to 0 by default.
300+
Description: Do background GC aggressively when set. Set to 0 by default.
301+
gc urgent high(1): does GC forcibly in a period of given
302+
gc_urgent_sleep_time and ignores I/O idling check. uses greedy
303+
GC approach and turns SSR mode on.
304+
gc urgent low(2): lowers the bar of checking I/O idling in
305+
order to process outstanding discard commands and GC a
306+
little bit aggressively. uses cost benefit GC approach.
307+
gc urgent mid(3): does GC forcibly in a period of given
308+
gc_urgent_sleep_time and executes a mid level of I/O idling check.
309+
uses cost benefit GC approach.
277310

278311
What: /sys/fs/f2fs/<disk>/gc_urgent_sleep_time
279312
Date: August 2017
@@ -430,6 +463,7 @@ Description: Show status of f2fs superblock in real time.
430463
0x800 SBI_QUOTA_SKIP_FLUSH skip flushing quota in current CP
431464
0x1000 SBI_QUOTA_NEED_REPAIR quota file may be corrupted
432465
0x2000 SBI_IS_RESIZEFS resizefs is in process
466+
0x4000 SBI_IS_FREEZING freefs is in process
433467
====== ===================== =================================
434468

435469
What: /sys/fs/f2fs/<disk>/ckpt_thread_ioprio
@@ -503,7 +537,7 @@ Date: July 2021
503537
Contact: "Daeho Jeong" <[email protected]>
504538
Description: Show how many segments have been reclaimed by GC during a specific
505539
GC mode (0: GC normal, 1: GC idle CB, 2: GC idle greedy,
506-
3: GC idle AT, 4: GC urgent high, 5: GC urgent low)
540+
3: GC idle AT, 4: GC urgent high, 5: GC urgent low 6: GC urgent mid)
507541
You can re-initialize this value to "0".
508542

509543
What: /sys/fs/f2fs/<disk>/gc_segment_mode
@@ -540,3 +574,9 @@ Contact: "Daeho Jeong" <[email protected]>
540574
Description: You can set the trial count limit for GC urgent high mode with this value.
541575
If GC thread gets to the limit, the mode will turn back to GC normal mode.
542576
By default, the value is zero, which means there is no limit like before.
577+
578+
What: /sys/fs/f2fs/<disk>/max_roll_forward_node_blocks
579+
Date: January 2022
580+
Contact: "Jaegeuk Kim" <[email protected]>
581+
Description: Controls max # of node block writes to be used for roll forward
582+
recovery. This can limit the roll forward recovery time.

fs/f2fs/Kconfig

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,3 +143,10 @@ config F2FS_IOSTAT
143143
Support getting IO statistics through sysfs and printing out periodic
144144
IO statistics tracepoint events. You have to turn on "iostat_enable"
145145
sysfs node to enable this feature.
146+
147+
config F2FS_UNFAIR_RWSEM
148+
bool "F2FS unfair rw_semaphore"
149+
depends on F2FS_FS && BLK_CGROUP
150+
help
151+
Use unfair rw_semaphore, if system configured IO priority by block
152+
cgroup.

fs/f2fs/acl.c

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -204,8 +204,9 @@ struct posix_acl *f2fs_get_acl(struct inode *inode, int type, bool rcu)
204204
return __f2fs_get_acl(inode, type, NULL);
205205
}
206206

207-
static int f2fs_acl_update_mode(struct inode *inode, umode_t *mode_p,
208-
struct posix_acl **acl)
207+
static int f2fs_acl_update_mode(struct user_namespace *mnt_userns,
208+
struct inode *inode, umode_t *mode_p,
209+
struct posix_acl **acl)
209210
{
210211
umode_t mode = inode->i_mode;
211212
int error;
@@ -218,14 +219,15 @@ static int f2fs_acl_update_mode(struct inode *inode, umode_t *mode_p,
218219
return error;
219220
if (error == 0)
220221
*acl = NULL;
221-
if (!in_group_p(i_gid_into_mnt(&init_user_ns, inode)) &&
222-
!capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FSETID))
222+
if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
223+
!capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
223224
mode &= ~S_ISGID;
224225
*mode_p = mode;
225226
return 0;
226227
}
227228

228-
static int __f2fs_set_acl(struct inode *inode, int type,
229+
static int __f2fs_set_acl(struct user_namespace *mnt_userns,
230+
struct inode *inode, int type,
229231
struct posix_acl *acl, struct page *ipage)
230232
{
231233
int name_index;
@@ -238,7 +240,8 @@ static int __f2fs_set_acl(struct inode *inode, int type,
238240
case ACL_TYPE_ACCESS:
239241
name_index = F2FS_XATTR_INDEX_POSIX_ACL_ACCESS;
240242
if (acl && !ipage) {
241-
error = f2fs_acl_update_mode(inode, &mode, &acl);
243+
error = f2fs_acl_update_mode(mnt_userns, inode,
244+
&mode, &acl);
242245
if (error)
243246
return error;
244247
set_acl_inode(inode, mode);
@@ -279,7 +282,7 @@ int f2fs_set_acl(struct user_namespace *mnt_userns, struct inode *inode,
279282
if (unlikely(f2fs_cp_error(F2FS_I_SB(inode))))
280283
return -EIO;
281284

282-
return __f2fs_set_acl(inode, type, acl, NULL);
285+
return __f2fs_set_acl(mnt_userns, inode, type, acl, NULL);
283286
}
284287

285288
/*
@@ -419,15 +422,15 @@ int f2fs_init_acl(struct inode *inode, struct inode *dir, struct page *ipage,
419422
f2fs_mark_inode_dirty_sync(inode, true);
420423

421424
if (default_acl) {
422-
error = __f2fs_set_acl(inode, ACL_TYPE_DEFAULT, default_acl,
425+
error = __f2fs_set_acl(NULL, inode, ACL_TYPE_DEFAULT, default_acl,
423426
ipage);
424427
posix_acl_release(default_acl);
425428
} else {
426429
inode->i_default_acl = NULL;
427430
}
428431
if (acl) {
429432
if (!error)
430-
error = __f2fs_set_acl(inode, ACL_TYPE_ACCESS, acl,
433+
error = __f2fs_set_acl(NULL, inode, ACL_TYPE_ACCESS, acl,
431434
ipage);
432435
posix_acl_release(acl);
433436
} else {

fs/f2fs/checkpoint.c

Lines changed: 36 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,13 @@ static struct page *__get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index,
9898
}
9999

100100
if (unlikely(!PageUptodate(page))) {
101+
if (page->index == sbi->metapage_eio_ofs &&
102+
sbi->metapage_eio_cnt++ == MAX_RETRY_META_PAGE_EIO) {
103+
set_ckpt_flags(sbi, CP_ERROR_FLAG);
104+
} else {
105+
sbi->metapage_eio_ofs = page->index;
106+
sbi->metapage_eio_cnt = 0;
107+
}
101108
f2fs_put_page(page, 1);
102109
return ERR_PTR(-EIO);
103110
}
@@ -282,18 +289,22 @@ int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages,
282289
return blkno - start;
283290
}
284291

285-
void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index)
292+
void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index,
293+
unsigned int ra_blocks)
286294
{
287295
struct page *page;
288296
bool readahead = false;
289297

298+
if (ra_blocks == RECOVERY_MIN_RA_BLOCKS)
299+
return;
300+
290301
page = find_get_page(META_MAPPING(sbi), index);
291302
if (!page || !PageUptodate(page))
292303
readahead = true;
293304
f2fs_put_page(page, 0);
294305

295306
if (readahead)
296-
f2fs_ra_meta_pages(sbi, index, BIO_MAX_VECS, META_POR, true);
307+
f2fs_ra_meta_pages(sbi, index, ra_blocks, META_POR, true);
297308
}
298309

299310
static int __f2fs_write_meta_page(struct page *page,
@@ -351,13 +362,13 @@ static int f2fs_write_meta_pages(struct address_space *mapping,
351362
goto skip_write;
352363

353364
/* if locked failed, cp will flush dirty pages instead */
354-
if (!down_write_trylock(&sbi->cp_global_sem))
365+
if (!f2fs_down_write_trylock(&sbi->cp_global_sem))
355366
goto skip_write;
356367

357368
trace_f2fs_writepages(mapping->host, wbc, META);
358369
diff = nr_pages_to_write(sbi, META, wbc);
359370
written = f2fs_sync_meta_pages(sbi, META, wbc->nr_to_write, FS_META_IO);
360-
up_write(&sbi->cp_global_sem);
371+
f2fs_up_write(&sbi->cp_global_sem);
361372
wbc->nr_to_write = max((long)0, wbc->nr_to_write - written - diff);
362373
return 0;
363374

@@ -864,22 +875,24 @@ static struct page *validate_checkpoint(struct f2fs_sb_info *sbi,
864875
struct page *cp_page_1 = NULL, *cp_page_2 = NULL;
865876
struct f2fs_checkpoint *cp_block = NULL;
866877
unsigned long long cur_version = 0, pre_version = 0;
878+
unsigned int cp_blocks;
867879
int err;
868880

869881
err = get_checkpoint_version(sbi, cp_addr, &cp_block,
870882
&cp_page_1, version);
871883
if (err)
872884
return NULL;
873885

874-
if (le32_to_cpu(cp_block->cp_pack_total_block_count) >
875-
sbi->blocks_per_seg) {
886+
cp_blocks = le32_to_cpu(cp_block->cp_pack_total_block_count);
887+
888+
if (cp_blocks > sbi->blocks_per_seg || cp_blocks <= F2FS_CP_PACKS) {
876889
f2fs_warn(sbi, "invalid cp_pack_total_block_count:%u",
877890
le32_to_cpu(cp_block->cp_pack_total_block_count));
878891
goto invalid_cp;
879892
}
880893
pre_version = *version;
881894

882-
cp_addr += le32_to_cpu(cp_block->cp_pack_total_block_count) - 1;
895+
cp_addr += cp_blocks - 1;
883896
err = get_checkpoint_version(sbi, cp_addr, &cp_block,
884897
&cp_page_2, version);
885898
if (err)
@@ -1159,7 +1172,7 @@ static bool __need_flush_quota(struct f2fs_sb_info *sbi)
11591172
if (!is_journalled_quota(sbi))
11601173
return false;
11611174

1162-
if (!down_write_trylock(&sbi->quota_sem))
1175+
if (!f2fs_down_write_trylock(&sbi->quota_sem))
11631176
return true;
11641177
if (is_sbi_flag_set(sbi, SBI_QUOTA_SKIP_FLUSH)) {
11651178
ret = false;
@@ -1171,7 +1184,7 @@ static bool __need_flush_quota(struct f2fs_sb_info *sbi)
11711184
} else if (get_pages(sbi, F2FS_DIRTY_QDATA)) {
11721185
ret = true;
11731186
}
1174-
up_write(&sbi->quota_sem);
1187+
f2fs_up_write(&sbi->quota_sem);
11751188
return ret;
11761189
}
11771190

@@ -1228,10 +1241,10 @@ static int block_operations(struct f2fs_sb_info *sbi)
12281241
* POR: we should ensure that there are no dirty node pages
12291242
* until finishing nat/sit flush. inode->i_blocks can be updated.
12301243
*/
1231-
down_write(&sbi->node_change);
1244+
f2fs_down_write(&sbi->node_change);
12321245

12331246
if (get_pages(sbi, F2FS_DIRTY_IMETA)) {
1234-
up_write(&sbi->node_change);
1247+
f2fs_up_write(&sbi->node_change);
12351248
f2fs_unlock_all(sbi);
12361249
err = f2fs_sync_inode_meta(sbi);
12371250
if (err)
@@ -1241,15 +1254,15 @@ static int block_operations(struct f2fs_sb_info *sbi)
12411254
}
12421255

12431256
retry_flush_nodes:
1244-
down_write(&sbi->node_write);
1257+
f2fs_down_write(&sbi->node_write);
12451258

12461259
if (get_pages(sbi, F2FS_DIRTY_NODES)) {
1247-
up_write(&sbi->node_write);
1260+
f2fs_up_write(&sbi->node_write);
12481261
atomic_inc(&sbi->wb_sync_req[NODE]);
12491262
err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO);
12501263
atomic_dec(&sbi->wb_sync_req[NODE]);
12511264
if (err) {
1252-
up_write(&sbi->node_change);
1265+
f2fs_up_write(&sbi->node_change);
12531266
f2fs_unlock_all(sbi);
12541267
return err;
12551268
}
@@ -1262,13 +1275,13 @@ static int block_operations(struct f2fs_sb_info *sbi)
12621275
* dirty node blocks and some checkpoint values by block allocation.
12631276
*/
12641277
__prepare_cp_block(sbi);
1265-
up_write(&sbi->node_change);
1278+
f2fs_up_write(&sbi->node_change);
12661279
return err;
12671280
}
12681281

12691282
static void unblock_operations(struct f2fs_sb_info *sbi)
12701283
{
1271-
up_write(&sbi->node_write);
1284+
f2fs_up_write(&sbi->node_write);
12721285
f2fs_unlock_all(sbi);
12731286
}
12741287

@@ -1543,6 +1556,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
15431556
/* update user_block_counts */
15441557
sbi->last_valid_block_count = sbi->total_valid_block_count;
15451558
percpu_counter_set(&sbi->alloc_valid_block_count, 0);
1559+
percpu_counter_set(&sbi->rf_node_block_count, 0);
15461560

15471561
/* Here, we have one bio having CP pack except cp pack 2 page */
15481562
f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);
@@ -1612,7 +1626,7 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
16121626
f2fs_warn(sbi, "Start checkpoint disabled!");
16131627
}
16141628
if (cpc->reason != CP_RESIZE)
1615-
down_write(&sbi->cp_global_sem);
1629+
f2fs_down_write(&sbi->cp_global_sem);
16161630

16171631
if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
16181632
((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
@@ -1693,7 +1707,7 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
16931707
trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
16941708
out:
16951709
if (cpc->reason != CP_RESIZE)
1696-
up_write(&sbi->cp_global_sem);
1710+
f2fs_up_write(&sbi->cp_global_sem);
16971711
return err;
16981712
}
16991713

@@ -1741,9 +1755,9 @@ static int __write_checkpoint_sync(struct f2fs_sb_info *sbi)
17411755
struct cp_control cpc = { .reason = CP_SYNC, };
17421756
int err;
17431757

1744-
down_write(&sbi->gc_lock);
1758+
f2fs_down_write(&sbi->gc_lock);
17451759
err = f2fs_write_checkpoint(sbi, &cpc);
1746-
up_write(&sbi->gc_lock);
1760+
f2fs_up_write(&sbi->gc_lock);
17471761

17481762
return err;
17491763
}
@@ -1831,9 +1845,9 @@ int f2fs_issue_checkpoint(struct f2fs_sb_info *sbi)
18311845
if (!test_opt(sbi, MERGE_CHECKPOINT) || cpc.reason != CP_SYNC) {
18321846
int ret;
18331847

1834-
down_write(&sbi->gc_lock);
1848+
f2fs_down_write(&sbi->gc_lock);
18351849
ret = f2fs_write_checkpoint(sbi, &cpc);
1836-
up_write(&sbi->gc_lock);
1850+
f2fs_up_write(&sbi->gc_lock);
18371851

18381852
return ret;
18391853
}

0 commit comments

Comments
 (0)