-
Notifications
You must be signed in to change notification settings - Fork 2
Cpu profiles: CPUID #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gardenlinux
Are you sure you want to change the base?
Cpu profiles: CPUID #62
Conversation
|
I will fix the CI failures and do one last check on the CPU model, stepping id, family values before marking this as ready. |
02c244e to
659edd1
Compare
659edd1 to
1435a01
Compare
|
this can now be rebased #63 |
Since enabling AMX tile state components affect the result returned by `Hypervisor::get_supported_cpuid` we want this enabled prior to checking CPUID compatibility between the source and destination VMs. Although this is not required today, it is necessary in order for the upcoming CPU profiles correctly, and it will also be necessary once the check_cpuid_compatibility checks are extended to take state components into account. Signed-off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
These data structures are required to define CPU profiles. Signed-off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
We want CPU profiles to keep a record of the hypervisor type and cpu vendor that they are intended to work with. This is made more convenient if all of these types implement common traits (used for serialization). Signed-Off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
We introduce essential data structures together with basic functionality that is necessary to apply a CPU profile to a host. Signed-off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
We integrate the CPU profile into the various configs that ultimately get set by the user. This quickly ends up involving multiple files, luckily Rust helps us find which ones via compilation errors. Signed-Off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
1435a01 to
7e56fe1
Compare
phip1611
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, amazing work! I left a few nits. One does clearly see the hundreds of horus you invested into this!
| affinity: None, | ||
| features: CpuFeatures::default(), | ||
| nested: true, | ||
| profile: Default::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would it make sense to be a little more descriptive here and use CpuProfile::Host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, but I think I would prefer to avoid another import (or spelling out arch::CpuProfile::Host) here.
| let phys_bits = physical_bits(self.hypervisor.as_ref(), guard.cpus.max_phys_bits); | ||
| let kvm_hyperv = guard.cpus.kvm_hyperv; | ||
| let profile = guard.cpus.profile; | ||
| // Drop the guard before function call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose to free a lock which will net to be called in the underlying function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is just good practice to drop locks before calling functions as 1) You don't know how long the call will take and 2) If the function internally happens to attempt to lock the same object (often bad practice) then at least now you know that you are not causing a dead lock.
| pub register: CpuidReg, | ||
| } | ||
|
|
||
| /// Describes a policy for how the corresponding CPUID data should be considered when building |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused here. This is net used at runtime but only required for the generation tool, right? Perhaps the description could be a little more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few more lines to the documentation. I hope it is clearer now.
7e56fe1 to
109ad26
Compare
109ad26 to
2de0d08
Compare
If a CPU profile is configured it should result in guests seeing a restricted subset of CPUID. This is what we finally achieve in this commit. Signed-off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
We include CPU profiles corresponding to Intel Skylake and Sapphire rapids server that we generated using our WIP CPU profile generation tool. Signed-of-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
We introduce data structures to describe values within the registers modified by the CPUID instruction. These data structures will later be used by the upcoming CPU profile generation tool. Signed-off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
We introduce CPUID definitions for Intel CPUs that will be utilized by the upcoming CPU Profile generation tool. Signed-off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
We introduce CPUID definitions defined for the KVM hypervisor. These definitions will later be utilized by the upcoming CPU profile generation tool. Signed-off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
We use the Intel CPUID definitions to provide more information when CPUID compatibility checks fail (when both the source and destination VM run on Intel CPUs). Signed-off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
This commit introduces a CLI for generating a CPU profile closely matching the CPU of the machine the CLI is executed on. The idea is to have a simple way to add more CPU profiles corresponding to physical CPUs. Note however that with the current setup one still needs a little bit of manual work to integrate the generated CPU profile data into cloud hypervisor itself. Signed-off-by: Oliver Anderson <[email protected]> On-behalf-of: SAP [email protected]
2de0d08 to
477b708
Compare
This PR introduces the concept of a CPU profile, a mechanism to opt-in to restricting CPU features a guest may use which then in turn makes live migration between hosts running on different hardware more tractable. In other words we are trying to introduce Cloud Hypervisor's analogue of QEMU's CPU models.
We restrict our scope to enforcing CPUID compliance for now and leave MSR restrictions for later.
I encourage reviewers to read the longer explanation below before starting your review. When you have read the description in its entirety you can start reviewing one commit at a time (which I have tried to make as nice as I can, although there are probably some things that could have been done better with fancy git magic).
The feature in more detail
Why do we even need this?
Recall that software is usually developed to run on a variety of processors with various features. In order for the software to dynamically discover which hardware features may be utilized one typically uses the CPUID instruction to query the CPU for information. In some cases one can also obtain CPU information via so called MSRs (model specific registers), but we will leave those out of this discussion for the time being.
Consider now the case that a guest is running some workload on host A which then gets migrated to host B where host B has a different processor than A. If this is a live migration (i.e. it is performed while the software is running), then the guest's workload can easily run into a time of check to time of use error.
Luckily hypervisors are able to manipulate what CPUID returns to guests when called which we can and will take advantage of in order to make live migration safer. Indeed if there is a subset of CPU features and properties that
are shared by all CPUs in a cluster, then if all hosts restrict themselves to that subset then live migration suddenly becomes a lot less problematic (although still not an entirely solved problem).
CPU profiles at a very high level
Using an existing CPU profile
In this PR we patch cloud hypervisor so that users may specify a CPU profile on the command line. This will in turn restrict what CPUID values the guest obtains and may thus affect functionality and performance, but may still be the best tool in the box when live migration to (subtly) different hardware is desired.
From a user perspective one simply includes the
profile=<desired cpu profile>argument to thecpusparameter e.g.in order to utilize the Intel Sapphire rapids server profile. If the host has hardware considered to be compatible with the chosen profile, then cloud hypervisor should work just like before, with the exception that guest's may see certain other values when inspecting CPUID.
Producing a new CPU profile
New CPU profiles can be generated in a semi automated fashion. All you need to do is to run the newly introduced binary
on the host you are interested in being compatible with (the compatibility target) and then you need to manually edit the
arch::x86_64::cpu_profile::CpuProfileenum by adding another variant and updating thedatamethod to deserialize the pre-generated json.The resulting CPU profile should expose a lot of the same functionality as the host where it was generated, but we apply a few extra restrictions to functionality that is either inherently incompatible with live migration, or should not be used in a cloud setting for other reasons. There is currently no way for users to opt-out of these extra restrictions neither during profile generation, nor later when the profile is loaded on a new host.
Note that we also currently only support Intel CPUs and KVM as the hypervisor, but a lot of the logic is agnostic of these things and it should be relatively easy to lift these limitations (more on that later).
The design in a bit of detail
The implementation is based on the understanding that we do not only need to work with bit sets, but we also need to manipulate some multi-bit values. Indeed, some CPUID output values indicate how many leaves there are (such as for instance leaf 0, sub-leaf 0, EAX), while others may tell you the number of sub-leaves, or the size of a state component for instance. Note that if the CPUID instruction is executed with an invalid leaf, some processors will return the data for the highest basic information leaf. In a live-migration setting this could easily lead to a time of check to time of use error!
Since we cannot simply deal with bit sets (at least to the best of our understanding) we unfortunately end up with a rather more complex solution than what we had initially hoped for.
CPU Profile policies
The idea is that we have a static list describing all known values one can obtain from CPUID (on an Intel CPU) and also such a list of the values defined by KVM (within the leaf range reserved for hypervisors).
Although AMDs CPUID descriptions mostly agree with those from Intel, there are exceptions, such as for instance which XSAVE state components corresponds to avx-512 register state. Hence in order to not complicate matters more we decided to focus on Intel for now. If/when we are to include support for AMD I recommend having a separate list describing AMDs values, even if it ends up having say 80% overlap with Intel.
Now for each of these described CPUID values we decide on a CPU profile policy. These (currently) include:
From these policies we generate and serialize CPU profile data which is then later loaded and utilized to adjust
a guest's CPUID whenever the profile is in use.
Relationship with CPUID compatibility checks
A reasonable question to ask is why don't the CPU profiles only contain data relevant to whether the existing
check_cpuid_compatibilitychecks are satisfied and let everything else just be copied over from the host?First of all the CPUID compatibility checks only check a fixed set of CPUID entries. While it is certainly true that these checks can be extended as new CPU features appear over time, it is still a problem if you are running an older version of Cloud hypervisor (that is unaware of said new features) on a very modern CPU.
We argue that the implementation of CPU profiles as presented here is more future proof in that sense. This is because any CPUID entry/value not known to the profile will get zeroed out when the CPU profile is applied.
Furthermore we suspect that the already existing CPUID compatibility checks could be improved. CPUID leaf 0xa (Architectural Performance Monitoring Leaf) is just one example of a leaf that is not accounted for by the existing checks, but probably should be. We have verified that non-trivial values in this leaf are indeed visible to guests, but it is unlikely that this can work in the context of live migrations. Note that QEMU only makes performance counters available when the host profile is selected (this is also the case for the CPU profiles introduced in this PR).
Immediate Follow up tasks
Ideas on how to incorporate MSR restrictions
There are really many MSRs and we would prefer to avoid creating a table describing all of them (like we did for CPUID).
Instead we propose doing the following:
KVM_GET_MSR_INDEX_LISTin the CPU profile data. Then when the CPU profile is applied we useKVM_X86_SET_MSR_FILTERto deny access to the MSRs that are not among those listed in the CPU profile data.KVM_GET_MSR_FEATURE_INDEX_LIST. From what I have observed this returns around 21 indices which is small enough that we can manage them in a similar manner to how we dealt with CPUID. In other words we create a table describing the bits within these MSRs, together with profile policies and extend the CPU profile generation tool to record adjustments according to our specification. When the generated profile is later used, we adjust the feature MSRs accordingly.KVM_GET_MSR_INDEX_LISTreturns contains the indices we extracted from the profile. We can consider allowing some exceptions, in the cases where the MSRs are irrelevant due to CPUID indicating that the MSR in question is not valid anyway. We will also need a check for the feature MSRs. This latter check needs to be more thorough, but as there are not that many values in this case, we hope that this should be rather manageable.