CP-309998: ignore small amount of pages in other nodes#6884
CP-309998: ignore small amount of pages in other nodes#6884edwintorok merged 1 commit intomasterfrom
Conversation
|
Could you explain in the commit why 16 MiB is used? I think it's because it's the minimum allowed, but it would be good to confirm document it |
|
4096 is used because it's the value used in the RRD behaviour of VM.numa_nodes calculation, so that both code paths (RRD and field VM.numa_nodes return the same result). |
rrdp-squeezed currently has a threshold of >4096 to count the number of nodes. This is to ignore small number of internal data structures that xen or other kernel devices may sometimes allocate for the VM outside the node where the VM's main memory is allocated. This is a temporary fix until we account in CP-311303 about these small number of pages that sometimes appear out of the VM's main node. In experiments, it's usually a single-digit number like 1. The maximum number observed was around ~2200 pages. Without this fix, the VM.numa_nodes calculation is different of the one returned by rrdp-squeezed, and VM.numa_nodes is over-sensitive to these small number of pages in other nodes. Signed-off-by: Marcus Granado <marcus.granado@cloud.com>
b662272 to
464de44
Compare
|
I've updated the PR description and the commit message to convey more clearly what this calculation does. |
|
Lets merge this workaround for now and then open another PR to share the code and reduce the tech debt being introduced here. |
|
To avoid a magic constant (4096) we could ignore pages up to memory_overhead+video_memory maybe (we only claim the main memory, any overhead , the p2m pages, etc could be allocated from somewhere else if we ran out). Try filtering out all 0 sized nodes in the VM's memory usage, then sort memory in decreasing order, and use a fold to go through them and subtract from a running main_memory_used and increment main_memory_nodes. Once main_memory_used hits 0, then from the next numa node (if any) then increment overhead_memory_nodes instead, but stop incrementing main_memory_nodes. there is a xapi function to compute the overhead, although rrdp probably can't use that, but maybe it can link memory.ml and calculate it itself. |
lindig
left a comment
There was a problem hiding this comment.
Still think there should be a code comment: (* see commit message *)
| in | ||
| let optimised = nodes = 1 || nodes < host_nodes in | ||
| {optimised; nodes; memory} | ||
| with Failure _ -> default |
There was a problem hiding this comment.
While you are here: is this unexpected and should be logged?
This is an optimisation for the calculation of the number of nodes until we fix all the small xen internal allocations for the VM outside the main guest's memory. This synchronises the behaviour in the RRD VM.numa_nodes calculation.