Skip to content

Commit 5a6b4cc

Browse files
davidhildenbrandmstsirkin
authored andcommitted
virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
Commit 7199462 ("virtio_balloon: replace oom notifier with shrinker") changed the behavior when deflation happens automatically. Instead of deflating when called by the OOM handler, the shrinker is used. However, the balloon is not simply some slab cache that should be shrunk when under memory pressure. The shrinker does not have a concept of priorities, so this behavior cannot be configured. There was a report that this results in undesired side effects when inflating the balloon to shrink the page cache. [1] "When inflating the balloon against page cache (i.e. no free memory remains) vmscan.c will both shrink page cache, but also invoke the shrinkers -- including the balloon's shrinker. So the balloon driver allocates memory which requires reclaim, vmscan gets this memory by shrinking the balloon, and then the driver adds the memory back to the balloon. Basically a busy no-op." The name "deflate on OOM" makes it pretty clear when deflation should happen - after other approaches to reclaim memory failed, not while reclaiming. This allows to minimize the footprint of a guest - memory will only be taken out of the balloon when really needed. Especially, a drop_slab() will result in the whole balloon getting deflated - undesired. While handling it via the OOM handler might not be perfect, it keeps existing behavior. If we want a different behavior, then we need a new feature bit and document it properly (although, there should be a clear use case and the intended effects should be well described). Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because this has no such side effects. Always register the shrinker with VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free pages that are still to be processed by the guest. The hypervisor takes care of identifying and resolving possible races between processing a hinting request and the guest reusing a page. In contrast to pre commit 7199462 ("virtio_balloon: replace oom notifier with shrinker"), don't add a moodule parameter to configure the number of pages to deflate on OOM. Can be re-added if really needed. Also, pay attention that leak_balloon() returns the number of 4k pages - convert it properly in virtio_balloon_oom_notify(). Note1: using the OOM handler is frowned upon, but it really is what we need for this feature. Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we could actually skip sending deflation requests to our hypervisor, making the OOM path *very* simple. Besically freeing pages and updating the balloon. If the communication with the host ever becomes a problem on this call path. [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html Test report by Tyler Sanderson: Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42 GB file full of random bytes that we continually cat to /dev/null. This fills the page cache as the file is read. Meanwhile we trigger the balloon to inflate, with a target size of 53 GB. This setup causes the balloon inflation to pressure the page cache as the page cache is also trying to grow. Afterwards we shrink the balloon back to zero (so total deflate = total inflate). Without patch (kernel 4.19.0-5): Inflation never reaches the target until we stop the "cat file > /dev/null" process. Total inflation time was 542 seconds. The longest period that made no net forward progress was 315 seconds (see attached graph). Result of "grep balloon /proc/vmstat" after the test: balloon_inflate 154828377 balloon_deflate 154828377 With patch (kernel 5.6.0-rc4+): Total inflation duration was 63 seconds. No deflate-queue activity occurs when pressuring the page-cache. Result of "grep balloon /proc/vmstat" after the test: balloon_inflate 12968539 balloon_deflate 12968539 Conclusion: This patch fixes the issue. In the test it reduced inflate/deflate activity by 12x, and reduced inflation time by 8.6x. But more importantly, if we hadn't killed the "grep balloon /proc/vmstat" process then, without the patch, the inflation process would never reach the target. Attached [1] is a png of a graph showing the problematic behavior without this patch. It shows deflate-queue activity increasing linearly while balloon size stays constant over the course of more than 8 minutes of the test. [1] https://lore.kernel.org/linux-mm/CAJuQAmphPcfew1v_EOgAdSFiprzjiZjmOf3iJDmFX0gD6b9TYQ@mail.gmail.com/2-without_patch.png Full test report and discussion [2]: [2] https://lore.kernel.org/r/CAJuQAmphPcfew1v_EOgAdSFiprzjiZjmOf3iJDmFX0gD6b9TYQ@mail.gmail.com Tested-by: Tyler Sanderson <[email protected]> Reported-by: Tyler Sanderson <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Wei Wang <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: David Rientjes <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
1 parent 3024e20 commit 5a6b4cc

File tree

1 file changed

+44
-63
lines changed

1 file changed

+44
-63
lines changed

drivers/virtio/virtio_balloon.c

Lines changed: 44 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#include <linux/slab.h>
1515
#include <linux/module.h>
1616
#include <linux/balloon_compaction.h>
17+
#include <linux/oom.h>
1718
#include <linux/wait.h>
1819
#include <linux/mm.h>
1920
#include <linux/mount.h>
@@ -27,7 +28,9 @@
2728
*/
2829
#define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
2930
#define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
30-
#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
31+
/* Maximum number of (4k) pages to deflate on OOM notifications. */
32+
#define VIRTIO_BALLOON_OOM_NR_PAGES 256
33+
#define VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY 80
3134

3235
#define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
3336
__GFP_NOMEMALLOC)
@@ -112,8 +115,11 @@ struct virtio_balloon {
112115
/* Memory statistics */
113116
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
114117

115-
/* To register a shrinker to shrink memory upon memory pressure */
118+
/* Shrinker to return free pages - VIRTIO_BALLOON_F_FREE_PAGE_HINT */
116119
struct shrinker shrinker;
120+
121+
/* OOM notifier to deflate on OOM - VIRTIO_BALLOON_F_DEFLATE_ON_OOM */
122+
struct notifier_block oom_nb;
117123
};
118124

119125
static struct virtio_device_id id_table[] = {
@@ -788,77 +794,36 @@ static unsigned long shrink_free_pages(struct virtio_balloon *vb,
788794
return blocks_freed * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
789795
}
790796

791-
static unsigned long leak_balloon_pages(struct virtio_balloon *vb,
792-
unsigned long pages_to_free)
793-
{
794-
return leak_balloon(vb, pages_to_free * VIRTIO_BALLOON_PAGES_PER_PAGE) /
795-
VIRTIO_BALLOON_PAGES_PER_PAGE;
796-
}
797-
798-
static unsigned long shrink_balloon_pages(struct virtio_balloon *vb,
799-
unsigned long pages_to_free)
800-
{
801-
unsigned long pages_freed = 0;
802-
803-
/*
804-
* One invocation of leak_balloon can deflate at most
805-
* VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
806-
* multiple times to deflate pages till reaching pages_to_free.
807-
*/
808-
while (vb->num_pages && pages_freed < pages_to_free)
809-
pages_freed += leak_balloon_pages(vb,
810-
pages_to_free - pages_freed);
811-
812-
update_balloon_size(vb);
813-
814-
return pages_freed;
815-
}
816-
817797
static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
818798
struct shrink_control *sc)
819799
{
820-
unsigned long pages_to_free, pages_freed = 0;
821800
struct virtio_balloon *vb = container_of(shrinker,
822801
struct virtio_balloon, shrinker);
823802

824-
pages_to_free = sc->nr_to_scan;
825-
826-
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
827-
pages_freed = shrink_free_pages(vb, pages_to_free);
828-
829-
if (pages_freed >= pages_to_free)
830-
return pages_freed;
831-
832-
pages_freed += shrink_balloon_pages(vb, pages_to_free - pages_freed);
833-
834-
return pages_freed;
803+
return shrink_free_pages(vb, sc->nr_to_scan);
835804
}
836805

837806
static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
838807
struct shrink_control *sc)
839808
{
840809
struct virtio_balloon *vb = container_of(shrinker,
841810
struct virtio_balloon, shrinker);
842-
unsigned long count;
843-
844-
count = vb->num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE;
845-
count += vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
846811

847-
return count;
812+
return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
848813
}
849814

850-
static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
815+
static int virtio_balloon_oom_notify(struct notifier_block *nb,
816+
unsigned long dummy, void *parm)
851817
{
852-
unregister_shrinker(&vb->shrinker);
853-
}
818+
struct virtio_balloon *vb = container_of(nb,
819+
struct virtio_balloon, oom_nb);
820+
unsigned long *freed = parm;
854821

855-
static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
856-
{
857-
vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
858-
vb->shrinker.count_objects = virtio_balloon_shrinker_count;
859-
vb->shrinker.seeks = DEFAULT_SEEKS;
822+
*freed += leak_balloon(vb, VIRTIO_BALLOON_OOM_NR_PAGES) /
823+
VIRTIO_BALLOON_PAGES_PER_PAGE;
824+
update_balloon_size(vb);
860825

861-
return register_shrinker(&vb->shrinker);
826+
return NOTIFY_OK;
862827
}
863828

864829
static int virtballoon_probe(struct virtio_device *vdev)
@@ -935,22 +900,35 @@ static int virtballoon_probe(struct virtio_device *vdev)
935900
virtio_cwrite(vb->vdev, struct virtio_balloon_config,
936901
poison_val, &poison_val);
937902
}
938-
}
939-
/*
940-
* We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to decide if a
941-
* shrinker needs to be registered to relieve memory pressure.
942-
*/
943-
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
944-
err = virtio_balloon_register_shrinker(vb);
903+
904+
/*
905+
* We're allowed to reuse any free pages, even if they are
906+
* still to be processed by the host.
907+
*/
908+
vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
909+
vb->shrinker.count_objects = virtio_balloon_shrinker_count;
910+
vb->shrinker.seeks = DEFAULT_SEEKS;
911+
err = register_shrinker(&vb->shrinker);
945912
if (err)
946913
goto out_del_balloon_wq;
947914
}
915+
if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) {
916+
vb->oom_nb.notifier_call = virtio_balloon_oom_notify;
917+
vb->oom_nb.priority = VIRTIO_BALLOON_OOM_NOTIFY_PRIORITY;
918+
err = register_oom_notifier(&vb->oom_nb);
919+
if (err < 0)
920+
goto out_unregister_shrinker;
921+
}
922+
948923
virtio_device_ready(vdev);
949924

950925
if (towards_target(vb))
951926
virtballoon_changed(vdev);
952927
return 0;
953928

929+
out_unregister_shrinker:
930+
if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
931+
unregister_shrinker(&vb->shrinker);
954932
out_del_balloon_wq:
955933
if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
956934
destroy_workqueue(vb->balloon_wq);
@@ -989,8 +967,11 @@ static void virtballoon_remove(struct virtio_device *vdev)
989967
{
990968
struct virtio_balloon *vb = vdev->priv;
991969

992-
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
993-
virtio_balloon_unregister_shrinker(vb);
970+
if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
971+
unregister_oom_notifier(&vb->oom_nb);
972+
if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
973+
unregister_shrinker(&vb->shrinker);
974+
994975
spin_lock_irq(&vb->stop_update_lock);
995976
vb->stop_update = true;
996977
spin_unlock_irq(&vb->stop_update_lock);

0 commit comments

Comments
 (0)