Skip to content

Commit 6fda0bb

Browse files
committed
Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton: "28 hotfixes. 23 are cc:stable and the other five address issues which were introduced during this merge cycle. 20 are for MM and the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits) maple_tree: fix a potential concurrency bug in RCU mode maple_tree: fix get wrong data_end in mtree_lookup_walk() mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() nilfs2: fix sysfs interface lifetime mm: take a page reference when removing device exclusive entries mm: vmalloc: avoid warn_alloc noise caused by fatal signal nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread() zsmalloc: document freeable stats zsmalloc: document new fullness grouping fsdax: force clear dirty mark if CoW mm/hugetlb: fix uffd wr-protection for CoW optimization path mm: enable maple tree RCU mode by default maple_tree: add RCU lock checking to rcu callback functions maple_tree: add smp_rmb() to dead node detection maple_tree: fix write memory barrier of nodes once dead for RCU mode maple_tree: remove extra smp_wmb() from mas_dead_leaves() maple_tree: fix freeing of nodes in rcu mode maple_tree: detect dead nodes in mas_start() maple_tree: be more cautious about dead nodes ...
2 parents aa318c4 + c45ea31 commit 6fda0bb

File tree

19 files changed

+402
-195
lines changed

19 files changed

+402
-195
lines changed

.mailmap

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,7 +265,9 @@ Krzysztof Kozlowski <[email protected]> <[email protected]>
265265
Krzysztof Kozlowski <[email protected]> <[email protected]>
266266
Kuninori Morimoto <[email protected]>
267267
268+
Leonard Crestez <[email protected]> Leonard Crestez <[email protected]>
268269
270+
Leonard Göhrs <[email protected]>
269271
Leonid I Ananiev <[email protected]>
270272
271273

Documentation/mm/zsmalloc.rst

Lines changed: 76 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,12 @@ With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
3939

4040
# cat /sys/kernel/debug/zsmalloc/zram0/classes
4141

42-
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage
42+
class size 10% 20% 30% 40% 50% 60% 70% 80% 90% 99% 100% obj_allocated obj_used pages_used pages_per_zspage freeable
4343
...
4444
...
45-
9 176 0 1 186 129 8 4
46-
10 192 1 0 2880 2872 135 3
47-
11 208 0 1 819 795 42 2
48-
12 224 0 1 219 159 12 4
45+
30 512 0 12 4 1 0 1 0 0 1 0 414 3464 3346 433 1 14
46+
31 528 2 7 2 2 1 0 1 0 0 2 117 4154 3793 536 4 44
47+
32 544 6 3 4 1 2 1 0 0 0 1 260 4170 3965 556 2 26
4948
...
5049
...
5150

@@ -54,10 +53,28 @@ class
5453
index
5554
size
5655
object size zspage stores
57-
almost_empty
58-
the number of ZS_ALMOST_EMPTY zspages(see below)
59-
almost_full
60-
the number of ZS_ALMOST_FULL zspages(see below)
56+
10%
57+
the number of zspages with usage ratio less than 10% (see below)
58+
20%
59+
the number of zspages with usage ratio between 10% and 20%
60+
30%
61+
the number of zspages with usage ratio between 20% and 30%
62+
40%
63+
the number of zspages with usage ratio between 30% and 40%
64+
50%
65+
the number of zspages with usage ratio between 40% and 50%
66+
60%
67+
the number of zspages with usage ratio between 50% and 60%
68+
70%
69+
the number of zspages with usage ratio between 60% and 70%
70+
80%
71+
the number of zspages with usage ratio between 70% and 80%
72+
90%
73+
the number of zspages with usage ratio between 80% and 90%
74+
99%
75+
the number of zspages with usage ratio between 90% and 99%
76+
100%
77+
the number of zspages with usage ratio 100%
6178
obj_allocated
6279
the number of objects allocated
6380
obj_used
@@ -66,19 +83,14 @@ pages_used
6683
the number of pages allocated for the class
6784
pages_per_zspage
6885
the number of 0-order pages to make a zspage
86+
freeable
87+
the approximate number of pages class compaction can free
6988

70-
We assign a zspage to ZS_ALMOST_EMPTY fullness group when n <= N / f, where
71-
72-
* n = number of allocated objects
73-
* N = total number of objects zspage can store
74-
* f = fullness_threshold_frac(ie, 4 at the moment)
75-
76-
Similarly, we assign zspage to:
77-
78-
* ZS_ALMOST_FULL when n > N / f
79-
* ZS_EMPTY when n == 0
80-
* ZS_FULL when n == N
81-
89+
Each zspage maintains inuse counter which keeps track of the number of
90+
objects stored in the zspage. The inuse counter determines the zspage's
91+
"fullness group" which is calculated as the ratio of the "inuse" objects to
92+
the total number of objects the zspage can hold (objs_per_zspage). The
93+
closer the inuse counter is to objs_per_zspage, the better.
8294

8395
Internals
8496
=========
@@ -94,10 +106,10 @@ of objects that each zspage can store.
94106

95107
For instance, consider the following size classes:::
96108

97-
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
109+
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
98110
...
99-
94 1536 0 0 0 0 0 3 0
100-
100 1632 0 0 0 0 0 2 0
111+
94 1536 0 .... 0 0 0 0 3 0
112+
100 1632 0 .... 0 0 0 0 2 0
101113
...
102114

103115

@@ -134,10 +146,11 @@ reduces memory wastage.
134146

135147
Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
136148

137-
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
149+
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
150+
138151
...
139-
202 3264 0 0 0 0 0 4 0
140-
254 4096 0 0 0 0 0 1 0
152+
202 3264 0 .. 0 0 0 0 4 0
153+
254 4096 0 .. 0 0 0 0 1 0
141154
...
142155

143156
Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
@@ -151,40 +164,42 @@ efficient storage of large objects.
151164

152165
For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
153166

154-
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
167+
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
168+
155169
...
156-
202 3264 0 0 0 0 0 4 0
157-
211 3408 0 0 0 0 0 5 0
158-
217 3504 0 0 0 0 0 6 0
159-
222 3584 0 0 0 0 0 7 0
160-
225 3632 0 0 0 0 0 8 0
161-
254 4096 0 0 0 0 0 1 0
170+
202 3264 0 .. 0 0 0 0 4 0
171+
211 3408 0 .. 0 0 0 0 5 0
172+
217 3504 0 .. 0 0 0 0 6 0
173+
222 3584 0 .. 0 0 0 0 7 0
174+
225 3632 0 .. 0 0 0 0 8 0
175+
254 4096 0 .. 0 0 0 0 1 0
162176
...
163177

164178
For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
165179

166-
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
180+
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
181+
167182
...
168-
202 3264 0 0 0 0 0 4 0
169-
206 3328 0 0 0 0 0 13 0
170-
207 3344 0 0 0 0 0 9 0
171-
208 3360 0 0 0 0 0 14 0
172-
211 3408 0 0 0 0 0 5 0
173-
212 3424 0 0 0 0 0 16 0
174-
214 3456 0 0 0 0 0 11 0
175-
217 3504 0 0 0 0 0 6 0
176-
219 3536 0 0 0 0 0 13 0
177-
222 3584 0 0 0 0 0 7 0
178-
223 3600 0 0 0 0 0 15 0
179-
225 3632 0 0 0 0 0 8 0
180-
228 3680 0 0 0 0 0 9 0
181-
230 3712 0 0 0 0 0 10 0
182-
232 3744 0 0 0 0 0 11 0
183-
234 3776 0 0 0 0 0 12 0
184-
235 3792 0 0 0 0 0 13 0
185-
236 3808 0 0 0 0 0 14 0
186-
238 3840 0 0 0 0 0 15 0
187-
254 4096 0 0 0 0 0 1 0
183+
202 3264 0 .. 0 0 0 0 4 0
184+
206 3328 0 .. 0 0 0 0 13 0
185+
207 3344 0 .. 0 0 0 0 9 0
186+
208 3360 0 .. 0 0 0 0 14 0
187+
211 3408 0 .. 0 0 0 0 5 0
188+
212 3424 0 .. 0 0 0 0 16 0
189+
214 3456 0 .. 0 0 0 0 11 0
190+
217 3504 0 .. 0 0 0 0 6 0
191+
219 3536 0 .. 0 0 0 0 13 0
192+
222 3584 0 .. 0 0 0 0 7 0
193+
223 3600 0 .. 0 0 0 0 15 0
194+
225 3632 0 .. 0 0 0 0 8 0
195+
228 3680 0 .. 0 0 0 0 9 0
196+
230 3712 0 .. 0 0 0 0 10 0
197+
232 3744 0 .. 0 0 0 0 11 0
198+
234 3776 0 .. 0 0 0 0 12 0
199+
235 3792 0 .. 0 0 0 0 13 0
200+
236 3808 0 .. 0 0 0 0 14 0
201+
238 3840 0 .. 0 0 0 0 15 0
202+
254 4096 0 .. 0 0 0 0 1 0
188203
...
189204

190205
Overall the combined zspage chain size effect on zsmalloc pool configuration:::
@@ -214,9 +229,10 @@ zram as a build artifacts storage (Linux kernel compilation).
214229

215230
zsmalloc classes stats:::
216231

217-
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
232+
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
233+
218234
...
219-
Total 13 51 413836 412973 159955 3
235+
Total 13 .. 51 413836 412973 159955 3
220236

221237
zram mm_stat:::
222238

@@ -227,9 +243,10 @@ zram as a build artifacts storage (Linux kernel compilation).
227243

228244
zsmalloc classes stats:::
229245

230-
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
246+
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
247+
231248
...
232-
Total 18 87 414852 412978 156666 0
249+
Total 18 .. 87 414852 412978 156666 0
233250

234251
zram mm_stat:::
235252

fs/dax.c

Lines changed: 47 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -781,6 +781,33 @@ static int __dax_invalidate_entry(struct address_space *mapping,
781781
return ret;
782782
}
783783

784+
static int __dax_clear_dirty_range(struct address_space *mapping,
785+
pgoff_t start, pgoff_t end)
786+
{
787+
XA_STATE(xas, &mapping->i_pages, start);
788+
unsigned int scanned = 0;
789+
void *entry;
790+
791+
xas_lock_irq(&xas);
792+
xas_for_each(&xas, entry, end) {
793+
entry = get_unlocked_entry(&xas, 0);
794+
xas_clear_mark(&xas, PAGECACHE_TAG_DIRTY);
795+
xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE);
796+
put_unlocked_entry(&xas, entry, WAKE_NEXT);
797+
798+
if (++scanned % XA_CHECK_SCHED)
799+
continue;
800+
801+
xas_pause(&xas);
802+
xas_unlock_irq(&xas);
803+
cond_resched();
804+
xas_lock_irq(&xas);
805+
}
806+
xas_unlock_irq(&xas);
807+
808+
return 0;
809+
}
810+
784811
/*
785812
* Delete DAX entry at @index from @mapping. Wait for it
786813
* to be unlocked before deleting it.
@@ -1258,15 +1285,20 @@ static s64 dax_unshare_iter(struct iomap_iter *iter)
12581285
/* don't bother with blocks that are not shared to start with */
12591286
if (!(iomap->flags & IOMAP_F_SHARED))
12601287
return length;
1261-
/* don't bother with holes or unwritten extents */
1262-
if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
1263-
return length;
12641288

12651289
id = dax_read_lock();
12661290
ret = dax_iomap_direct_access(iomap, pos, length, &daddr, NULL);
12671291
if (ret < 0)
12681292
goto out_unlock;
12691293

1294+
/* zero the distance if srcmap is HOLE or UNWRITTEN */
1295+
if (srcmap->flags & IOMAP_F_SHARED || srcmap->type == IOMAP_UNWRITTEN) {
1296+
memset(daddr, 0, length);
1297+
dax_flush(iomap->dax_dev, daddr, length);
1298+
ret = length;
1299+
goto out_unlock;
1300+
}
1301+
12701302
ret = dax_iomap_direct_access(srcmap, pos, length, &saddr, NULL);
12711303
if (ret < 0)
12721304
goto out_unlock;
@@ -1435,6 +1467,16 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
14351467
* written by write(2) is visible in mmap.
14361468
*/
14371469
if (iomap->flags & IOMAP_F_NEW || cow) {
1470+
/*
1471+
* Filesystem allows CoW on non-shared extents. The src extents
1472+
* may have been mmapped with dirty mark before. To be able to
1473+
* invalidate its dax entries, we need to clear the dirty mark
1474+
* in advance.
1475+
*/
1476+
if (cow)
1477+
__dax_clear_dirty_range(iomi->inode->i_mapping,
1478+
pos >> PAGE_SHIFT,
1479+
(end - 1) >> PAGE_SHIFT);
14381480
invalidate_inode_pages2_range(iomi->inode->i_mapping,
14391481
pos >> PAGE_SHIFT,
14401482
(end - 1) >> PAGE_SHIFT);
@@ -2022,8 +2064,8 @@ int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
20222064

20232065
while ((ret = iomap_iter(&src_iter, ops)) > 0 &&
20242066
(ret = iomap_iter(&dst_iter, ops)) > 0) {
2025-
compared = dax_range_compare_iter(&src_iter, &dst_iter, len,
2026-
same);
2067+
compared = dax_range_compare_iter(&src_iter, &dst_iter,
2068+
min(src_iter.len, dst_iter.len), same);
20272069
if (compared < 0)
20282070
return ret;
20292071
src_iter.processed = dst_iter.processed = compared;

fs/nilfs2/btree.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2219,6 +2219,7 @@ static int nilfs_btree_assign_p(struct nilfs_bmap *btree,
22192219
/* on-disk format */
22202220
binfo->bi_dat.bi_blkoff = cpu_to_le64(key);
22212221
binfo->bi_dat.bi_level = level;
2222+
memset(binfo->bi_dat.bi_pad, 0, sizeof(binfo->bi_dat.bi_pad));
22222223

22232224
return 0;
22242225
}

fs/nilfs2/direct.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,7 @@ static int nilfs_direct_assign_p(struct nilfs_bmap *direct,
314314

315315
binfo->bi_dat.bi_blkoff = cpu_to_le64(key);
316316
binfo->bi_dat.bi_level = 0;
317+
memset(binfo->bi_dat.bi_pad, 0, sizeof(binfo->bi_dat.bi_pad));
317318

318319
return 0;
319320
}

fs/nilfs2/segment.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2609,11 +2609,10 @@ static int nilfs_segctor_thread(void *arg)
26092609
goto loop;
26102610

26112611
end_thread:
2612-
spin_unlock(&sci->sc_state_lock);
2613-
26142612
/* end sync. */
26152613
sci->sc_task = NULL;
26162614
wake_up(&sci->sc_wait_task); /* for nilfs_segctor_kill_thread() */
2615+
spin_unlock(&sci->sc_state_lock);
26172616
return 0;
26182617
}
26192618

fs/nilfs2/super.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -482,6 +482,7 @@ static void nilfs_put_super(struct super_block *sb)
482482
up_write(&nilfs->ns_sem);
483483
}
484484

485+
nilfs_sysfs_delete_device_group(nilfs);
485486
iput(nilfs->ns_sufile);
486487
iput(nilfs->ns_cpfile);
487488
iput(nilfs->ns_dat);
@@ -1105,6 +1106,7 @@ nilfs_fill_super(struct super_block *sb, void *data, int silent)
11051106
nilfs_put_root(fsroot);
11061107

11071108
failed_unload:
1109+
nilfs_sysfs_delete_device_group(nilfs);
11081110
iput(nilfs->ns_sufile);
11091111
iput(nilfs->ns_cpfile);
11101112
iput(nilfs->ns_dat);

fs/nilfs2/the_nilfs.c

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,6 @@ void destroy_nilfs(struct the_nilfs *nilfs)
8787
{
8888
might_sleep();
8989
if (nilfs_init(nilfs)) {
90-
nilfs_sysfs_delete_device_group(nilfs);
9190
brelse(nilfs->ns_sbh[0]);
9291
brelse(nilfs->ns_sbh[1]);
9392
}
@@ -305,6 +304,10 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
305304
goto failed;
306305
}
307306

307+
err = nilfs_sysfs_create_device_group(sb);
308+
if (unlikely(err))
309+
goto sysfs_error;
310+
308311
if (valid_fs)
309312
goto skip_recovery;
310313

@@ -366,6 +369,9 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
366369
goto failed;
367370

368371
failed_unload:
372+
nilfs_sysfs_delete_device_group(nilfs);
373+
374+
sysfs_error:
369375
iput(nilfs->ns_cpfile);
370376
iput(nilfs->ns_sufile);
371377
iput(nilfs->ns_dat);
@@ -697,10 +703,6 @@ int init_nilfs(struct the_nilfs *nilfs, struct super_block *sb, char *data)
697703
if (err)
698704
goto failed_sbh;
699705

700-
err = nilfs_sysfs_create_device_group(sb);
701-
if (err)
702-
goto failed_sbh;
703-
704706
set_nilfs_init(nilfs);
705707
err = 0;
706708
out:

include/linux/mm_types.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -774,7 +774,8 @@ struct mm_struct {
774774
unsigned long cpu_bitmap[];
775775
};
776776

777-
#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN)
777+
#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \
778+
MT_FLAGS_USE_RCU)
778779
extern struct mm_struct init_mm;
779780

780781
/* Pointer magic because the dynamic array size confuses some compilers. */

0 commit comments

Comments
 (0)