-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: Intel AMX support #5065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Intel AMX support #5065
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5065 +/- ##
==========================================
- Coverage 83.18% 83.15% -0.03%
==========================================
Files 247 248 +1
Lines 26816 26901 +85
==========================================
+ Hits 22306 22370 +64
- Misses 4510 4531 +21
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d7875f2 to
2ae659a
Compare
9ed7e6c to
c7331eb
Compare
roypat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we already want to add a changelog entry for this? I know the title says "make perf tests work", but really what we're doing is making firecracker support AMX xP
I'd like to declare official support of Intel AMX when snapshot restore is supported :) |
27ae1dc to
9e074a2
Compare
9e074a2 to
7556037
Compare
e49f406 to
3d9e5de
Compare
17ceb3a to
fc8effa
Compare
Manciukic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm, but I think that fam_len thing is so confusing right now. If we don't want/can't change the api, I think we should better explain in the comment what is happening because it makes my head spin every time I look at it
ea4225f to
0e8ec6c
Compare
Bindings for ARCH_REQ_XCOMP_GUEST_PERM and ARCH_XCOMP_TILEDATA are required to enable Intel AMX's XTILEDATA for XSAVE. Note that the required bits were added in kernel v6.4+. Signed-off-by: Takahiro Itazuri <[email protected]>
Eq and PartialEq are not necessary for KvmError, rather disallows me to add error variants wrapping std::io::Error to handle syscall errors. Signed-off-by: Takahiro Itazuri <[email protected]>
Intel AMX (Advanced Matrix Extensions) was introduced in Intel Sapphire Rapids to accelerate deep learning and AI workloads. Since it requires a larger area to save its state, the TILEDATA feature is disabled by default. We request permission for it by default because it can be disabled via CPU template. Otherwise, kernels prior to v6.4 have a bug where KVM_GET_SUPPORTED_CPUID returns an inconsistent state of TILECFG enabled but TILEDATA disabled by default, causing guest's #GP fault on xsetbv instruction. Signed-off-by: Takahiro Itazuri <[email protected]>
0e8ec6c to
f997aca
Compare
Intel AMX is an XSTATE feature and TILEDATA is disabled by default because it requires a larger area to save its state than the traditional 4096 bytes. Instead, Linux kernel allows VMMs to request the guest permission via `arch_prctl()`. As such, the size of the XSTATE buffer required to save XSTASTE is dynamic. To support dynamically-sized buffer, `KVM_CAP_XSAVE2` was introduced with `KVM_GET_XSAVE2`. Accordingly, kvm-bindings added `Xsave` that is an alias of `FamStructWrapper` for the `kvm_xsave` struct with FAM in the end, and kvm-ioctls added `get_xsave2()` for `KVM_GET_XSAVE2` and `set_xsave2()` to take `Xsave` to call `KVM_SET_XSAVE`. Change the type of `xsave` in `VcpuState` from `kvm_xsave` to `Xsave`. Use `get_xsave2()` and `set_xsave2()`. Signed-off-by: Takahiro Itazuri <[email protected]>
KVM_GET_XSAVE2 is called when taking a snapshot, so it has to be allowed by seccomp filter. Signed-off-by: Takahiro Itazuri <[email protected]>
Intel AMX support was introduced but it is only supported on Intel Sapphire Rapids at the moment. We have to skip Intel AMX tests on older processors. Signed-off-by: Takahiro Itazuri <[email protected]>
To check that Intel AMX is indeed supported inside guest, check related features are listed in CPUID output. Signed-off-by: Takahiro Itazuri <[email protected]>
Now all the required changes for Intel AMX have been done :) Signed-off-by: Takahiro Itazuri <[email protected]>
a5d5d2c to
17ad9e9
Compare
This is the first PR for Intel Sapphire Rapids (EC2 7th-gen Intel instance type) support.
Note that this PR focuses on Intel AMX support and any integration tests for Intel Sapphire Rapids will be added in the upcoming PR.
Changes
arch_prctl()beforeKVM_GET_SUPPORTED_CPUIDto boot guests successfullykvm_xsaveto support snapshot restore of Intel AMXReason
Intel AMX (Advanced Matrix Extensions) is introduced in Intel Sapphire Rapids and a new instruction set for deep learning / AI workloads. Intel AMX is supported in
XSAVE/XRSTORthat are instructions to save/restore extensional CPU features' states into memory (e.g. for context switch). Which states to be saved/restored is configured by writing a bit vector to XCR0 viaXSETBVinstruction. Intel AMX introduces two new bits (TILECFG and TILEDATA) in (1) the bit vector (XCR0.TILECFG[bit 17]andXCR0.TILEDATA[bit 18]) as well as (2) CPUID to enumerate their supports (CPUID.(EAX=0DH,ECX=0):EAX.TILECFG[bit 17]andCPUID.(EAX=0DH,ECX=0):EAX.TILEDATA[bit 18]).Since the memory size required to save TILEDATA state is 8KB and it is larger than previously statically allocated memory size (4KB), Linux kernel decided to disable TILEDATA by default and allows userspace applications to enable it dynamically via
arch_prctl()syscall. This default disabling behavior is also the case with KVM. To enable TILEDATA for guests, VMM has to callarch_prctl(ARCH_REQ_XCOMP_GUEST_PERM, ...), which makesKVM_GET_SUPPORTED_CPUIDreturn a value withCPUID.(EAX=0DH,ECX=0):EAX.TILEDATA[bit 18]set. Conversely, withoutarch_prctl()prior toKVM_GET_SUPPORTED_CPUID, it returns an inconsistent state whereCPUID.(EAX=0DH,ECX=0):EAX.TILECFG[bit 17]is set butCPUID.(EAX=0DH,ECX=0):EAX.TILEDATA[bit 18]is not set. If such a AMX-half-enabled CPUID is passed toKVM_SET_CPUID2as it is, guests will crash with general protection fault during boot (See Appendix). This is because Linux kernel attempts to executeXSETBVinstruction with all XSAVE feature bits enumerated on CPUID during boot andXSETBVonly accepts either of both Intel AMX bits enabled or disabled. This bug ofKVM_GET_SUPPORTED_CPUIDreturning such a half-enabled state was fixed in kernel v6.4. But in any case, Firecracker supports the CPU template feature that enables users to mask CPU features (including Intel AMX), so we enable TILEDATA by default to make it work even on earlier kernels.To support a dynamically-sized XSTATE buffer, the Linux kernel extended the existing
kvm_xsaveby adding a flexible array member (FAM) in the end. Along with it,KVM_GET_XSAVE2API was added andKVM_SET_XSAVEAPI was extended. To support these changes, rust-vmm added (1)kvm_xsave2that holdskvm_xsaveand the length of the FAM, (2)XsaveasFamStructWrapperofkvm_xsave2, (3)get_xsave2()forKVM_GET_XSAVE2and (4)set_xsave2()to takeXsaveand callKVM_SET_XSAVE. Accordingly, use these methods and structs to support Intel AMX in Firecracker.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
[ ] I have updated any relevant documentation (both in code and in the docs)in the PR.
[ ] I have mentioned all user-facing changes inCHANGELOG.md.[ ] If a specific issue led to this PR, this PR closes the issue.[ ] When making API changes, I have followed theRunbook for Firecracker API changes.
integration tests.
[ ] I have linked an issue to every newTODO.rust-vmm.Appendix: GP fault on guest boot without Intel AMX enablement
With this PR, the GP fault doesn't happen. (Note that a functional test for CPU feature set will be added in an upcoming PR for functional integration tests support.)