-
Notifications
You must be signed in to change notification settings - Fork 296
CA-423172: Xen uses ~294 pages/vCPU, not 256 #6854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA-423172: Xen uses ~294 pages/vCPU, not 256 #6854
Conversation
|
Does this depend on the xen version, do we know if xs8 is affected as well? |
In theory yes, but 265 seems to work for both, we may want to backport this to the LCM branch eventually once it is all merged to master.
Currently 265 seems to work for both XS8 and XS9, although will need to do a bit wider testing on different hardware. |
|
265 (and even 256) feels like a lot of pages per vCPU (a MB); do you know where are those pages are used for ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
265 (and even 256) feels like a lot of pages per vCPU (a MB); do you know where are those pages are used for ?
Apparently for a lot of things. I looked at discussions on this in the past: Not even Xen hypervisor maintainers know the exact number, and I have found that it was said that it depends even on hardware capabilities and Xen's cmdline settings.
- Also the update to Xen 4.20 could have increased this.
- Additionally, some XenServer patches may increase it a bit. I know this patch would do this as well:
Off-topic for this PR, but for the per-domain overhead:
XenServer also has Xen patches that increase the overhead per domain,
- https://github.com/xenserver/xen.pg/blob/XS-8.4/patches/mixed-domain-runstates.patch
- XS9 also has a patch to keep track of domain memory per-NUMA node, that increases the size of
struct domainby 4 bytes * MAX_NUMNODES (64) = 256 bytes (per domain)
So the testing done by Edwin is probably the best bet we currently appear to have now.
Worth noting that enabling features like nested virtualization would likely increase considerably this number in practice (especially when nested virtualization is actively used).
Thanks, yes, I think so too. Nested virt is a big topic that we can't just enable though as the engineering for it to be secure for production will be quite some work before an production-ready implementation is upstream. So that's unfortunately quite forward-looking. But indeed, that needs to be added to the changes to make for when nested-virt would be productized.
|
The 265 isn't very deterministic, with a newly installed Xen I now get 274 sometimes: So we might need a higher number, although 3168 is higher than the max supported VM/host, so in practice this won't actually cause an OOM on its own (but could in combination with other inaccuracies). |
|
Converted to draft, need to reevaluate with the new Xen patches applied. |
|
Interesting, if I update this to 274, then the test says it should be 282, going to see if it can stabilize. |
Turns out the patch was completely wrong, the overhead is outside of the shadow allocation, not inside. So increasing shadow usage doesn't help to more accurately estimate overall VM memory usage. Moved the value outside, and repeated the tests, and now I get a stable value. Although still host dependent, on one Intel host I get 294-256=38, on the other 265-256=9. I used the higher number. |
The only Xen command-line related to this is `low_mem_virq_limit`, which is 64MiB. A new quicktest has shown that we are sometimes off by ~10MiB (between `Host.compute_free_memory` and actual free memory as measured by a call to Xenctrl physinfo) or more, and get failures booting VMs even after `assert_can_boot_here` said yes. Sometimes the error messages can be quite ugly, internal xenguest/xenopsd errors, instead of HOST_NOT_ENOUGH_FREE_MEMORY. After this change (together with #6854) the new quicktest doesn't fail anymore. PR to feature branch because this will need testing together with all the other NUMA changes, it may expose latent bugs elsewhere. The new testcase will get its own PR because it is quite large.
120b1a2 to
224b54f
Compare
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Measured the actual increase in host memory usage when increasing the number of vCPUs on a VM from 1 to 64: ``` vcpu,memory_overhead_pages,coeff 1,264,264 2,558,279 3,776,258.667 4,1032,258 5,1350,270 6,1614,269 7,1878,268.286 8,2056,257 9,2406,267.333 10,2670,267 11,2934,266.727 12,3198,266.5 13,3462,266.308 14,3726,266.143 15,3990,266 16,4254,265.875 17,4518,265.765 18,4782,265.667 19,5046,265.579 20,5310,265.5 21,5574,265.429 22,5838,265.364 23,6102,265.304 24,6366,265.25 25,6630,265.2 26,6894,265.154 27,7158,265.111 28,7422,265.071 29,7686,265.034 30,7952,265.067 31,8216,265.032 32,8480,265 33,8744,264.97 34,9009,264.971 35,9276,265.029 36,9543,265.083 37,9810,265.135 38,10076,265.158 39,10340,265.128 40,10604,265.1 41,10869,265.098 42,11133,265.071 43,11397,265.047 44,11662,265.045 45,11925,265 46,12191,265.022 47,12454,264.979 0,30,0 1,294,294 2,558,279 3,822,274 4,1086,271.5 5,1350,270 6,1614,269 7,1878,268.286 8,2142,267.75 9,2406,267.333 10,2670,267 11,2934,266.727 12,3198,266.5 13,3462,266.308 14,3726,266.143 15,3990,266 16,4254,265.875 17,4518,265.765 18,4782,265.667 19,5046,265.579 20,5310,265.5 21,5574,265.429 22,5838,265.364 23,6102,265.304 24,6366,265.25 25,6630,265.2 26,6894,265.154 27,7158,265.111 28,7422,265.071 29,7686,265.034 30,7952,265.067 31,8216,265.032 32,8480,265 33,8744,264.97 34,9011,265.029 35,9278,265.086 36,9546,265.167 37,9811,265.162 38,10076,265.158 39,10340,265.128 40,10603,265.075 41,10869,265.098 42,11132,265.048 43,11397,265.047 44,11663,265.068 45,11925,265 46,12191,265.022 47,12456,265.021 [INFO]VM memory_overhead_pages = ... + vcpu * 294 =~ ... + vcpu * 294 ``` We already allocate 256 pages/vcpu as part of shadow, so we need an extra 294-256=38 pages/vcpu. This can lead to internal errors raised by xenguest, or NOT_ENOUGH_FREE_MEMORY errors raised by xenopsd, after `assert_can_boot_here` has already replied yes, even when booting VMs sequentially. It could also lead XAPI to choose the wrong host to evacuate a VM too, which could lead to RPU migration failures. This is a pre-existing bug, affecting both the versions of Xen in XS8 and XS9. Cannot allocate this from shadow, because otherwise the memory usage would never converge (Xen doesn't allocate these from shadow). On another host the measured overhead is less, take the maximum for now: ``` [INFO]VM memory_overhead_pages = ... + vcpu * 264.067 =~ ... + vcpu * 265 ``` Also the amount of shadow memory reserved is nearly twice as much as needed, especially that shadow is compiled out of Xen, but overestimates are OK, and we might fix that separately. Signed-off-by: Edwin Török <edwin.torok@citrix.com>
224b54f to
ec3bd4a
Compare
72c7a25
Measured the actual increase in host memory usage when increasing the number of vCPUs on a VM from 1 to 64:
Ran the test on both an AMD and Intel host and got similar results.
Currently XAPI uses 256*vcpu, which is an underestimate.
This can lead to internal errors raised by xenguest, or NOT_ENOUGH_FREE_MEMORY errors raised by xenopsd, after
assert_can_boot_herehas already replied yes, even when booting VMs sequentially.It could also lead XAPI to choose the wrong host to evacuate a VM too, which could lead to RPU migration failures.
This is a pre-existing bug, affecting both the versions of Xen in XS8 and XS9.
PR to feature branch because this will need testing together with all the other NUMA changes, it may expose latent bugs elsewhere.
The new testcase will get its own PR because it is quite large.