Skip to content

deflate_on_oom doesn't seem to work as expected/documented #4324

@simonis

Description

@simonis

After reading the Ballooning documentation my understanding of the deflate_on_oom is that if the parameter is set to true the ballooning device will be deflated automatically if a process in the guest requires memory pages which can not be otherwise provided:

deflate_on_oom: if this is set to true and a guest process wants to allocate some memory which would make the guest enter an out-of-memory state, the kernel will take some pages from the balloon and give them to said process

However, if I run Firecracker with e.g. 2 vCPUs,1gb of memory and a balloon device of 900mb:

{
    "target_pages": 230400,
    "actual_pages": 230400,
    "target_mib": 900,
    "actual_mib": 900,
    "swap_in": 0,
    "swap_out": 0,
    "major_faults": 92,
    "minor_faults": 3103,
    "free_memory": 66572288,
    "total_memory": 84398080,
    "available_memory": 0,
    "disk_caches": 151552,
    "hugetlb_allocations": 0,
    "hugetlb_failures": 0
}

..and then try to start a Java process in the guest with -Xms800m -Xmx800m (i.e. with a heap size of 800mb) the Java process in the guest will hang, Firecracker will use ~200% CPU time but the actual size occupied by the ballooning device in the guest will not change and remain at 900mb:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2552550 xxxxxxxx  20   0 1063092  82992  82096 R  99,9   0,3   3:44.76 fc_vcpu 1
2552544 xxxxxxxx  20   0 1063092  82992  82096 R  90,9   0,3   3:12.17 firecracker
2552549 xxxxxxxx  20   0 1063092  82992  82096 S  25,0   0,3   0:43.47 fc_vcpu 0

Once I reset the target size of the ballooning device to 100mb, the Java process will become unblocked and start.

However, from the documentation of the deflate_on_oom option I would have expected that the guest kernel would deflate the ballooning device automatically, if deflate_on_oom=true?

If I run the same experiment with deflate_on_oom=false, I instantly get an out of memory error when I trying to start the Java process:

penJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000ce000000, 279576576, 0) failed; error='Not enough space' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 279576576 bytes for committing reserved memory.

which is what I would have expected.

Also, if I increase (i.e. inflate) the balloon to 900m again after I started the Java process, I start getting warnings from the ballooning driver (as documented):

[  282.580254] virtio_balloon virtio0: Out of puff! Can't get 1 pages

..but the CPU usage again goes up to almost ~200%. Is this expected? I mean, the warnings are OK, but I wouldn't expect that Firecracker will burn all its CPU shares while trying to inflate the balloon?

So to summarize, is the described behavior with deflate_on_oom=true a bug in the implementation or have I misunderstood the behavior of the ballooning device in the event of low memory in the guest?

PS: I've used the following kernel and FC versinons for the experiments:
Guest kernel: 5.19.8
Host kernel : 6.5.7 (Ubuntu 20.04)
Firecracker : 1.5.1 and 1.6.0-dev ( from today 036d9906)

Metadata

Metadata

Assignees

Labels

Status: Awaiting authorIndicates that an issue or pull request requires author actionType: BugIndicates an unexpected problem or unintended behavior

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions