Skip to content

Commit 96f8bf4

Browse files
hnaztorvalds
authored andcommitted
mm: vmscan: reclaim writepage is IO cost
The VM tries to balance reclaim pressure between anon and file so as to reduce the amount of IO incurred due to the memory shortage. It already counts refaults and swapins, but in addition it should also count writepage calls during reclaim. For swap, this is obvious: it's IO that wouldn't have occurred if the anonymous memory hadn't been under memory pressure. From a relative balancing point of view this makes sense as well: even if anon is cold and reclaimable, a cache that isn't thrashing may have equally cold pages that don't require IO to reclaim. For file writeback, it's trickier: some of the reclaim writepage IO would have likely occurred anyway due to dirty expiration. But not all of it - premature writeback reduces batching and generates additional writes. Since the flushers are already woken up by the time the VM starts writing cache pages one by one, let's assume that we'e likely causing writes that wouldn't have happened without memory pressure. In addition, the per-page cost of IO would have probably been much cheaper if written in larger batches from the flusher thread rather than the single-page-writes from kswapd. For our purposes - getting the trend right to accelerate convergence on a stable state that doesn't require paging at all - this is sufficiently accurate. If we later wanted to optimize for sustained thrashing, we can still refine the measurements. Count all writepage calls from kswapd as IO cost toward the LRU that the page belongs to. Why do this dynamically? Don't we know in advance that anon pages require IO to reclaim, and so could build in a static bias? First, scanning is not the same as reclaiming. If all the anon pages are referenced, we may not swap for a while just because we're scanning the anon list. During this time, however, it's important that we age anonymous memory and the page cache at the same rate so that their hot-cold gradients are comparable. Everything else being equal, we still want to reclaim the coldest memory overall. Second, we keep copies in swap unless the page changes. If there is swap-backed data that's mostly read (tmpfs file) and has been swapped out before, we can reclaim it without incurring additional IO. Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
1 parent 7cf111b commit 96f8bf4

File tree

6 files changed

+19
-9
lines changed

6 files changed

+19
-9
lines changed

include/linux/swap.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -334,7 +334,9 @@ extern unsigned long nr_free_pagecache_pages(void);
334334

335335

336336
/* linux/mm/swap.c */
337-
extern void lru_note_cost(struct page *);
337+
extern void lru_note_cost(struct lruvec *lruvec, bool file,
338+
unsigned int nr_pages);
339+
extern void lru_note_cost_page(struct page *);
338340
extern void lru_cache_add(struct page *);
339341
extern void lru_add_page_tail(struct page *page, struct page *page_tail,
340342
struct lruvec *lruvec, struct list_head *head);

include/linux/vmstat.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ struct reclaim_stat {
2626
unsigned nr_congested;
2727
unsigned nr_writeback;
2828
unsigned nr_immediate;
29+
unsigned nr_pageout;
2930
unsigned nr_activate[2];
3031
unsigned nr_ref_keep;
3132
unsigned nr_unmap_fail;

mm/swap.c

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -278,18 +278,16 @@ void rotate_reclaimable_page(struct page *page)
278278
}
279279
}
280280

281-
void lru_note_cost(struct page *page)
281+
void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages)
282282
{
283-
struct lruvec *lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
284-
285283
do {
286284
unsigned long lrusize;
287285

288286
/* Record cost event */
289-
if (page_is_file_lru(page))
290-
lruvec->file_cost++;
287+
if (file)
288+
lruvec->file_cost += nr_pages;
291289
else
292-
lruvec->anon_cost++;
290+
lruvec->anon_cost += nr_pages;
293291

294292
/*
295293
* Decay previous events
@@ -311,6 +309,12 @@ void lru_note_cost(struct page *page)
311309
} while ((lruvec = parent_lruvec(lruvec)));
312310
}
313311

312+
void lru_note_cost_page(struct page *page)
313+
{
314+
lru_note_cost(mem_cgroup_page_lruvec(page, page_pgdat(page)),
315+
page_is_file_lru(page), hpage_nr_pages(page));
316+
}
317+
314318
static void __activate_page(struct page *page, struct lruvec *lruvec,
315319
void *arg)
316320
{

mm/swap_state.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
442442

443443
/* XXX: Move to lru_cache_add() when it supports new vs putback */
444444
spin_lock_irq(&page_pgdat(page)->lru_lock);
445-
lru_note_cost(page);
445+
lru_note_cost_page(page);
446446
spin_unlock_irq(&page_pgdat(page)->lru_lock);
447447

448448
/* Caller will initiate read into locked page */

mm/vmscan.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1359,6 +1359,8 @@ static unsigned int shrink_page_list(struct list_head *page_list,
13591359
case PAGE_ACTIVATE:
13601360
goto activate_locked;
13611361
case PAGE_SUCCESS:
1362+
stat->nr_pageout += hpage_nr_pages(page);
1363+
13621364
if (PageWriteback(page))
13631365
goto keep;
13641366
if (PageDirty(page))
@@ -1964,6 +1966,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
19641966
move_pages_to_lru(lruvec, &page_list);
19651967

19661968
__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
1969+
lru_note_cost(lruvec, file, stat.nr_pageout);
19671970
item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
19681971
if (!cgroup_reclaim(sc))
19691972
__count_vm_events(item, nr_reclaimed);

mm/workingset.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,7 @@ void workingset_refault(struct page *page, void *shadow)
367367
SetPageWorkingset(page);
368368
/* XXX: Move to lru_cache_add() when it supports new vs putback */
369369
spin_lock_irq(&page_pgdat(page)->lru_lock);
370-
lru_note_cost(page);
370+
lru_note_cost_page(page);
371371
spin_unlock_irq(&page_pgdat(page)->lru_lock);
372372
inc_lruvec_state(lruvec, WORKINGSET_RESTORE);
373373
}

0 commit comments

Comments
 (0)