Skip to content

Commit e7ac4da

Browse files
Barry Songakpm00
authored andcommitted
mm: count zeromap read and set for swapout and swapin
When the proportion of folios from the zeromap is small, missing their accounting may not significantly impact profiling. However, it's easy to construct a scenario where this becomes an issue—for example, allocating 1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out and swap-in counts seem to vanish into a black hole, potentially causing semantic ambiguity. On the other hand, Usama reported that zero-filled pages can exceed 10% in workloads utilizing zswap, while Hailong noted that some app in Android have more than 6% zero-filled pages. Before commit 0ca0c24 ("mm: store zero pages to be swapped out in a bitmap"), both zswap and zRAM implemented similar optimizations, leading to these optimized-out pages being counted in either zswap or zRAM counters (with pswpin/pswpout also increasing for zRAM). With zeromap functioning prior to both zswap and zRAM, userspace will no longer detect these swap-out and swap-in actions. We have three ways to address this: 1. Introduce a dedicated counter specifically for the zeromap. 2. Use pswpin/pswpout accounting, treating the zero map as a standard backend. This approach aligns with zRAM's current handling of same-page fills at the device level. However, it would mean losing the optimized-out page counters previously available in zRAM and would not align with systems using zswap. Additionally, as noted by Nhat Pham, pswpin/pswpout counters apply only to I/O done directly to the backend device. 3. Count zeromap pages under zswap, aligning with system behavior when zswap is enabled. However, this would not be consistent with zRAM, nor would it align with systems lacking both zswap and zRAM. Given the complications with options 2 and 3, this patch selects option 1. We can find these counters from /proc/vmstat (counters for the whole system) and memcg's memory.stat (counters for the interested memcg). For example: $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat swpin_zero 1648 swpout_zero 33536 $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat swpin_zero 3905 swpout_zero 3985 This patch does not address any specific zeromap bug, but the missing swpout and swpin counts for zero-filled pages can be highly confusing and may mislead user-space agents that rely on changes in these counters as indicators. Therefore, we add a Fixes tag to encourage the inclusion of this counter in any kernel versions with zeromap. Many thanks to Kanchana for the contribution of changing count_objcg_event() to count_objcg_events() to support large folios[1], which has now been incorporated into this patch. [1] https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 0ca0c24 ("mm: store zero pages to be swapped out in a bitmap") Co-developed-by: Kanchana P Sridhar <[email protected]> Signed-off-by: Barry Song <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Usama Arif <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: Hailong Liu <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Chris Li <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Kairui Song <[email protected]> Cc: Ryan Roberts <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent c289f4d commit e7ac4da

File tree

7 files changed

+43
-8
lines changed

7 files changed

+43
-8
lines changed

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1599,6 +1599,15 @@ The following nested keys are defined.
15991599
pglazyfreed (npn)
16001600
Amount of reclaimed lazyfree pages
16011601

1602+
swpin_zero
1603+
Number of pages swapped into memory and filled with zero, where I/O
1604+
was optimized out because the page content was detected to be zero
1605+
during swapout.
1606+
1607+
swpout_zero
1608+
Number of zero-filled pages swapped out with I/O skipped due to the
1609+
content being detected as zero.
1610+
16021611
zswpin
16031612
Number of pages moved in to memory from zswap.
16041613

include/linux/memcontrol.h

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1760,8 +1760,9 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg)
17601760

17611761
struct mem_cgroup *mem_cgroup_from_slab_obj(void *p);
17621762

1763-
static inline void count_objcg_event(struct obj_cgroup *objcg,
1764-
enum vm_event_item idx)
1763+
static inline void count_objcg_events(struct obj_cgroup *objcg,
1764+
enum vm_event_item idx,
1765+
unsigned long count)
17651766
{
17661767
struct mem_cgroup *memcg;
17671768

@@ -1770,7 +1771,7 @@ static inline void count_objcg_event(struct obj_cgroup *objcg,
17701771

17711772
rcu_read_lock();
17721773
memcg = obj_cgroup_memcg(objcg);
1773-
count_memcg_events(memcg, idx, 1);
1774+
count_memcg_events(memcg, idx, count);
17741775
rcu_read_unlock();
17751776
}
17761777

@@ -1825,8 +1826,9 @@ static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
18251826
return NULL;
18261827
}
18271828

1828-
static inline void count_objcg_event(struct obj_cgroup *objcg,
1829-
enum vm_event_item idx)
1829+
static inline void count_objcg_events(struct obj_cgroup *objcg,
1830+
enum vm_event_item idx,
1831+
unsigned long count)
18301832
{
18311833
}
18321834

include/linux/vm_event_item.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
134134
#ifdef CONFIG_SWAP
135135
SWAP_RA,
136136
SWAP_RA_HIT,
137+
SWPIN_ZERO,
138+
SWPOUT_ZERO,
137139
#ifdef CONFIG_KSM
138140
KSM_SWPIN_COPY,
139141
#endif

mm/memcontrol.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,6 +431,10 @@ static const unsigned int memcg_vm_event_stat[] = {
431431
PGDEACTIVATE,
432432
PGLAZYFREE,
433433
PGLAZYFREED,
434+
#ifdef CONFIG_SWAP
435+
SWPIN_ZERO,
436+
SWPOUT_ZERO,
437+
#endif
434438
#ifdef CONFIG_ZSWAP
435439
ZSWPIN,
436440
ZSWPOUT,

mm/page_io.c

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,14 +204,22 @@ static bool is_folio_zero_filled(struct folio *folio)
204204

205205
static void swap_zeromap_folio_set(struct folio *folio)
206206
{
207+
struct obj_cgroup *objcg = get_obj_cgroup_from_folio(folio);
207208
struct swap_info_struct *sis = swp_swap_info(folio->swap);
209+
int nr_pages = folio_nr_pages(folio);
208210
swp_entry_t entry;
209211
unsigned int i;
210212

211213
for (i = 0; i < folio_nr_pages(folio); i++) {
212214
entry = page_swap_entry(folio_page(folio, i));
213215
set_bit(swp_offset(entry), sis->zeromap);
214216
}
217+
218+
count_vm_events(SWPOUT_ZERO, nr_pages);
219+
if (objcg) {
220+
count_objcg_events(objcg, SWPOUT_ZERO, nr_pages);
221+
obj_cgroup_put(objcg);
222+
}
215223
}
216224

217225
static void swap_zeromap_folio_clear(struct folio *folio)
@@ -503,6 +511,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret)
503511
static bool swap_read_folio_zeromap(struct folio *folio)
504512
{
505513
int nr_pages = folio_nr_pages(folio);
514+
struct obj_cgroup *objcg;
506515
bool is_zeromap;
507516

508517
/*
@@ -517,6 +526,13 @@ static bool swap_read_folio_zeromap(struct folio *folio)
517526
if (!is_zeromap)
518527
return false;
519528

529+
objcg = get_obj_cgroup_from_folio(folio);
530+
count_vm_events(SWPIN_ZERO, nr_pages);
531+
if (objcg) {
532+
count_objcg_events(objcg, SWPIN_ZERO, nr_pages);
533+
obj_cgroup_put(objcg);
534+
}
535+
520536
folio_zero_range(folio, 0, folio_size(folio));
521537
folio_mark_uptodate(folio);
522538
return true;

mm/vmstat.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1415,6 +1415,8 @@ const char * const vmstat_text[] = {
14151415
#ifdef CONFIG_SWAP
14161416
"swap_ra",
14171417
"swap_ra_hit",
1418+
"swpin_zero",
1419+
"swpout_zero",
14181420
#ifdef CONFIG_KSM
14191421
"ksm_swpin_copy",
14201422
#endif

mm/zswap.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1053,7 +1053,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
10531053

10541054
count_vm_event(ZSWPWB);
10551055
if (entry->objcg)
1056-
count_objcg_event(entry->objcg, ZSWPWB);
1056+
count_objcg_events(entry->objcg, ZSWPWB, 1);
10571057

10581058
zswap_entry_free(entry);
10591059

@@ -1483,7 +1483,7 @@ bool zswap_store(struct folio *folio)
14831483

14841484
if (objcg) {
14851485
obj_cgroup_charge_zswap(objcg, entry->length);
1486-
count_objcg_event(objcg, ZSWPOUT);
1486+
count_objcg_events(objcg, ZSWPOUT, 1);
14871487
}
14881488

14891489
/*
@@ -1577,7 +1577,7 @@ bool zswap_load(struct folio *folio)
15771577

15781578
count_vm_event(ZSWPIN);
15791579
if (entry->objcg)
1580-
count_objcg_event(entry->objcg, ZSWPIN);
1580+
count_objcg_events(entry->objcg, ZSWPIN, 1);
15811581

15821582
if (swapcache) {
15831583
zswap_entry_free(entry);

0 commit comments

Comments
 (0)