@@ -44,10 +44,13 @@ The device has two optional features which can be enabled with the following
4444options:
4545
4646- ` free_page_reporting ` : A mechanism for the guest to continually report ranges
47- of memory which the guest is not using and can be reclaimed. Read more here
48- - ` free_page_hinting ` : A dev-preview feature which is a different mechanism to
49- reclaim memory from the guest, this is instead triggered from the host. Read
50- more here.
47+ of memory which the guest is not using and can be reclaimed.
48+ [ Read more here] ( #virtio-balloon-free-page-reporting )
49+ - ` free_page_hinting ` : A
50+ [ developer-preview] ( ../docs/RELEASE_POLICY.md#developer-preview-features )
51+ feature which is a different mechanism to reclaim memory from the guest, this
52+ is instead triggered from the host.
53+ [ Read more here] ( #virtio-balloon-free-page-hinting )
5154
5255## Security disclaimer
5356
@@ -314,6 +317,125 @@ the `page_reporting_order` module parameter in the guest kernel. The page order
314317comes with trade-offs between performance and memory reclaimed; a good target is
315318to have the reported ranges match the backing page size.
316319
320+ ## Virtio balloon free page hinting
321+
322+ Free page hinting is a
323+ [ developer-preview] ( ../docs/RELEASE_POLICY.md#developer-preview-features )
324+ feature, which allows the guest driver to report ranges of memory which are not
325+ being used. In Firecracker, the balloon device will ` madvise ` the range with the
326+ ` MADV_DONTNEED ` flag, reducing the RSS of the guest. Free page hinting differs
327+ from reporting as this is instead initiated from the host side, giving more
328+ flexibility on when to reclaim memory.
329+
330+ To enable free page hinting when creating the balloon device, the
331+ ` free_page_hinting ` attribute should be set in the JSON object.
332+
333+ An example of how to configure the device to enable free page hinting:
334+
335+ ``` console
336+ socket_location=...
337+ amount_mib=...
338+ deflate_on_oom=...
339+ polling_interval=...
340+
341+ curl --unix-socket $socket_location -i \
342+ -X PUT 'http://localhost/balloon' \
343+ -H 'Accept: application/json' \
344+ -H 'Content-Type: application/json' \
345+ -d "{
346+ \"amount_mib\": $amount_mib, \
347+ \"deflate_on_oom\": $deflate_on_oom, \
348+ \"stats_polling_interval_s\": $polling_interval, \
349+ \"free_page_hinting\": true \
350+ }"
351+ ```
352+
353+ Free page hinting is initiated and managed by Firecracker, the core mechanism to
354+ control the run is with the ` cmd_id ` when Firecracker sets the ` cmd_id ` to a new
355+ number, the driver will acknowledge this and start reporting ranges, which
356+ Firecracker will free. Once the device has reported all the ranges it can find,
357+ it will update the ` cmd_id ` to reflect this. The device will then hold these
358+ ranges until Firecracker sends the stop command which allows the guest driver to
359+ reclaim the memory. The time required for the guest to complete a hinting run is
360+ dependant on a multitude of different factors and is mostly dictated by the
361+ guest, however, in testing the average time is ~ 0.2 seconds for a 1GB VM.
362+
363+ This control mechanism in Firecracker is managed through three separate
364+ endpoints ` /balloon/hinting/start ` , ` /balloon/hinting/status ` and
365+ ` /balloon/hinting/stop ` . For simple operation, call the start endpoint with
366+ ` acknowledge_on_stop = true ` , which will automatically send the stop command
367+ once the driver has finished.
368+
369+ An example of sending this command:
370+
371+ ``` console
372+ curl --unix-socket $socket_location -i \
373+ -X POST 'http://localhost/balloon/hinting/start' \
374+ -H 'Accept: application/json' \
375+ -H 'Content-Type: application/json' \
376+ -d "{
377+ \"acknowledge_on_stop\": true \
378+ }"
379+ ```
380+
381+ For fine-grained control, using ` acknowledge_on_stop = false ` , Firecracker will
382+ not send the acknowledge message. This can be used to get the guest to hold onto
383+ more memory. Using the ` /status ` endpoint, you can get information about the
384+ last ` cmd_id ` sent by Firecracker and the last update from the guest.
385+
386+ An example of the status request and response:
387+
388+ ``` console
389+ curl --unix-socket $socket_location -i \
390+ -X GET 'http://localhost/balloon/hinting/status' \
391+ -H 'Accept: application/json' \
392+ -H 'Content-Type: application/json'
393+ ```
394+
395+ Response:
396+
397+ ``` json
398+ {
399+ "host_cmd" : 1 ,
400+ "guest_cmd" : 2
401+ }
402+ ```
403+
404+ An example of the stop endpoint:
405+
406+ ``` console
407+ curl --unix-socket $socket_location -i \
408+ -X POST 'http://localhost/balloon/hinting/stop' \
409+ -H 'Accept: application/json' \
410+ -H 'Content-Type: application/json' \
411+ -d "{}"
412+ ```
413+
414+ On snapshot restore, the ` cmd_id ` is ** always** set to the stop ` cmd_id ` to
415+ allow the guest to reclaim the memory. If you have a particular use-case which
416+ requires this not to be the case, please raise an issue with a description of
417+ your scenario.
418+
419+ > [ !WARNING]
420+ >
421+ > Free page hinting was primarily designed for live migration, because of this
422+ > there is a caveat to the device spec which means the guest is able to reclaim
423+ > memory before Firecracker even receives the range to free. This can lead to a
424+ > scenario where the device frees memory that has been reclaimed in the guest,
425+ > potentially corrupting memory. The chances of this race happening are low, but
426+ > not impossible; hence the developer-preview status.
427+ >
428+ > We are currently working with the kernel community on a feature that will
429+ > eliminate this race. Once this has been resolved, we will update the device.
430+ >
431+ > One way to safely use this feature when using UFFD is:
432+ >
433+ > 1 . Enable ` WRITEPROTECT ` on the VM memory before starting a hinting run.
434+ > 1 . Track ranges that are written to.
435+ > 1 . Skip these ranges when Firecracker reports them for freeing.
436+ >
437+ > This will prevent ranges which have been reclaimed from being freed.
438+
317439## Balloon Caveats
318440
319441- Firecracker has no control over the speed of inflation or deflation; this is
0 commit comments