Skip to content

Commit aa21879

Browse files
davidhildenbrandmstsirkin
authored andcommitted
mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
virtio-mem wants to allow to offline memory blocks of which some parts were unplugged (allocated via alloc_contig_range()), especially, to later offline and remove completely unplugged memory blocks. The important part is that PageOffline() has to remain set until the section is offline, so these pages will never get accessed (e.g., when dumping). The pages should not be handed back to the buddy (which would require clearing PageOffline() and result in issues if offlining fails and the pages are suddenly in the buddy). Let's allow to do that by allowing to isolate any PageOffline() page when offlining. This way, we can reach the memory hotplug notifier MEM_GOING_OFFLINE, where the driver can signal that he is fine with offlining this page by dropping its reference count. PageOffline() pages with a reference count of 0 can then be skipped when offlining the pages (like if they were free, however they are not in the buddy). Anybody who uses PageOffline() pages and does not agree to offline them (e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not decrement the reference count and make offlining fail when trying to migrate such an unmovable page. So there should be no observable change. Same applies to balloon compaction users (movable PageOffline() pages), the pages will simply be migrated. Note 1: If offlining fails, a driver has to increment the reference count again in MEM_CANCEL_OFFLINE. Note 2: A driver that makes use of this has to be aware that re-onlining the memory block has to be handled by hooking into onlining code (online_page_callback_t), resetting the page PageOffline() and not giving them to the buddy. Reviewed-by: Alexander Duyck <[email protected]> Acked-by: Michal Hocko <[email protected]> Tested-by: Pankaj Gupta <[email protected]> Acked-by: Andrew Morton <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Anthony Yznaga <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Dan Williams <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Qian Cai <[email protected]> Cc: Pingfan Liu <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
1 parent 255f598 commit aa21879

File tree

4 files changed

+77
-10
lines changed

4 files changed

+77
-10
lines changed

include/linux/page-flags.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -777,6 +777,16 @@ PAGE_TYPE_OPS(Buddy, buddy)
777777
* not onlined when onlining the section).
778778
* The content of these pages is effectively stale. Such pages should not
779779
* be touched (read/write/dump/save) except by their owner.
780+
*
781+
* If a driver wants to allow to offline unmovable PageOffline() pages without
782+
* putting them back to the buddy, it can do so via the memory notifier by
783+
* decrementing the reference count in MEM_GOING_OFFLINE and incrementing the
784+
* reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOffline()
785+
* pages (now with a reference count of zero) are treated like free pages,
786+
* allowing the containing memory block to get offlined. A driver that
787+
* relies on this feature is aware that re-onlining the memory block will
788+
* require to re-set the pages PageOffline() and not giving them to the
789+
* buddy via online_page_callback_t.
780790
*/
781791
PAGE_TYPE_OPS(Offline, offline)
782792

mm/memory_hotplug.c

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1224,11 +1224,17 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn,
12241224

12251225
/*
12261226
* Scan pfn range [start,end) to find movable/migratable pages (LRU pages,
1227-
* non-lru movable pages and hugepages). We scan pfn because it's much
1228-
* easier than scanning over linked list. This function returns the pfn
1229-
* of the first found movable page if it's found, otherwise 0.
1227+
* non-lru movable pages and hugepages). Will skip over most unmovable
1228+
* pages (esp., pages that can be skipped when offlining), but bail out on
1229+
* definitely unmovable pages.
1230+
*
1231+
* Returns:
1232+
* 0 in case a movable page is found and movable_pfn was updated.
1233+
* -ENOENT in case no movable page was found.
1234+
* -EBUSY in case a definitely unmovable page was found.
12301235
*/
1231-
static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
1236+
static int scan_movable_pages(unsigned long start, unsigned long end,
1237+
unsigned long *movable_pfn)
12321238
{
12331239
unsigned long pfn;
12341240

@@ -1240,18 +1246,30 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
12401246
continue;
12411247
page = pfn_to_page(pfn);
12421248
if (PageLRU(page))
1243-
return pfn;
1249+
goto found;
12441250
if (__PageMovable(page))
1245-
return pfn;
1251+
goto found;
1252+
1253+
/*
1254+
* PageOffline() pages that are not marked __PageMovable() and
1255+
* have a reference count > 0 (after MEM_GOING_OFFLINE) are
1256+
* definitely unmovable. If their reference count would be 0,
1257+
* they could at least be skipped when offlining memory.
1258+
*/
1259+
if (PageOffline(page) && page_count(page))
1260+
return -EBUSY;
12461261

12471262
if (!PageHuge(page))
12481263
continue;
12491264
head = compound_head(page);
12501265
if (page_huge_active(head))
1251-
return pfn;
1266+
goto found;
12521267
skip = compound_nr(head) - (page - head);
12531268
pfn += skip - 1;
12541269
}
1270+
return -ENOENT;
1271+
found:
1272+
*movable_pfn = pfn;
12551273
return 0;
12561274
}
12571275

@@ -1518,7 +1536,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
15181536
}
15191537

15201538
do {
1521-
for (pfn = start_pfn; pfn;) {
1539+
pfn = start_pfn;
1540+
do {
15221541
if (signal_pending(current)) {
15231542
ret = -EINTR;
15241543
reason = "signal backoff";
@@ -1528,14 +1547,19 @@ static int __ref __offline_pages(unsigned long start_pfn,
15281547
cond_resched();
15291548
lru_add_drain_all();
15301549

1531-
pfn = scan_movable_pages(pfn, end_pfn);
1532-
if (pfn) {
1550+
ret = scan_movable_pages(pfn, end_pfn, &pfn);
1551+
if (!ret) {
15331552
/*
15341553
* TODO: fatal migration failures should bail
15351554
* out
15361555
*/
15371556
do_migrate_range(pfn, end_pfn);
15381557
}
1558+
} while (!ret);
1559+
1560+
if (ret != -ENOENT) {
1561+
reason = "unmovable page";
1562+
goto failed_removal_isolated;
15391563
}
15401564

15411565
/*

mm/page_alloc.c

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8372,6 +8372,19 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
83728372
if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
83738373
continue;
83748374

8375+
/*
8376+
* We treat all PageOffline() pages as movable when offlining
8377+
* to give drivers a chance to decrement their reference count
8378+
* in MEM_GOING_OFFLINE in order to indicate that these pages
8379+
* can be offlined as there are no direct references anymore.
8380+
* For actually unmovable PageOffline() where the driver does
8381+
* not support this, we will fail later when trying to actually
8382+
* move these pages that still have a reference count > 0.
8383+
* (false negatives in this function only)
8384+
*/
8385+
if ((flags & MEMORY_OFFLINE) && PageOffline(page))
8386+
continue;
8387+
83758388
if (__PageMovable(page) || PageLRU(page))
83768389
continue;
83778390

@@ -8792,6 +8805,17 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
87928805
offlined_pages++;
87938806
continue;
87948807
}
8808+
/*
8809+
* At this point all remaining PageOffline() pages have a
8810+
* reference count of 0 and can simply be skipped.
8811+
*/
8812+
if (PageOffline(page)) {
8813+
BUG_ON(page_count(page));
8814+
BUG_ON(PageBuddy(page));
8815+
pfn++;
8816+
offlined_pages++;
8817+
continue;
8818+
}
87958819

87968820
BUG_ON(page_count(page));
87978821
BUG_ON(!PageBuddy(page));

mm/page_isolation.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
151151
* a bit mask)
152152
* MEMORY_OFFLINE - isolate to offline (!allocate) memory
153153
* e.g., skip over PageHWPoison() pages
154+
* and PageOffline() pages.
154155
* REPORT_FAILURE - report details about the failure to
155156
* isolate the range
156157
*
@@ -259,6 +260,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
259260
else if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
260261
/* A HWPoisoned page cannot be also PageBuddy */
261262
pfn++;
263+
else if ((flags & MEMORY_OFFLINE) && PageOffline(page) &&
264+
!page_count(page))
265+
/*
266+
* The responsible driver agreed to skip PageOffline()
267+
* pages when offlining memory by dropping its
268+
* reference in MEM_GOING_OFFLINE.
269+
*/
270+
pfn++;
262271
else
263272
break;
264273
}

0 commit comments

Comments
 (0)