-
Notifications
You must be signed in to change notification settings - Fork 29
Fix Intel CPUID leaf 4 cache topology for SMT #1002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
oh nice, thanks! totally an oversight when I was in there. how visible is this under so, if this is legible in the guest, could you adjust that test to check (.. I also see that in retrospect that test assumes SMT siblings are adjacent in APIC ID, which is definitely wrong in general.) |
|
This is what I get before the patch which indeed shows each vCPU as its own private L1 and L2 cache. After the patch I get Let me add a test for this |
|
Also I just noticed the propolis/phd-tests/tests/src/cpuid.rs Line 326 in dacb53d
shouldn't sibling_idx be idx/2 instead of idx/4?The thread_siblings documentation in linux is vague but I think it is a hex bitmask where bit N is set if CPU N is a sibling. With SMT, CPUs 0-1 set bits 0-1 which gives '3', CPUs 2-3 set bits 2-3 which gives 'c' and so on. Since each hex digit covers 4 CPUs and we are iterating over pairs of siblings, we go through 2 pairs before moving to the next hex digit, so idx/2 makes sense instead of idx/4? The current code would only advance sibling_idx every 8 CPUs.I just ran the test it fails for me locally. |
When SMT is enabled, L1/L2 caches should report being shared by 2 logical processors (the SMT siblings). Previously EAX[25:14] was always being set to 0, indicating no sharing which contradicts the SMT topology reported in leaf 0xB. As per [1] EAX[25:14] indicates maximum number of addressable IDs for logical processors sharing this cache. This mismatch causes linux guest to print "BUG: arch topology borken / the SMT domain not a subset of the CLS domain" during boot. Linux derives L2 cache sharing groups from leaf 4 and expects SMT siblings to share L2 but it was being informed that each vCPU has private L1/L2. This brings the SMT handling logic in CPUID inline with what being done for AMD in fix_amd_cache_topo() which sets the sharing count to 2 when has_smt is true. This fixes oxidecomputer#1001. [1]: Table 1-15. Reference for CPUID Leaf 04H https://cdrdv2-public.intel.com/775917/intel-64-architecture-processor-topology-enumeration.pdf Signed-off-by: Amey Narkhede <[email protected]>
The existing test assertion would fail on hosts with SMT enabled due to incorrect index calculations. Also add has_smt() helper to skip thread_siblings checks on non-SMT hosts and remove the unused itertools import. Signed-off-by: Amey Narkhede <[email protected]>
Verify that Linux guest observes correct cache sharing topology from /sys/devices/system/cpu/cpu0/cache/. With SMT enabled, L1 and L2 caches should report sharing by SMT siblings while L3 should be shared across all vCPUs. Signed-off-by: Amey Narkhede <[email protected]>
|
I fixed the |
|
Also noticed this late
Do you mean index3 😅/L3 which should be shared across all cores? Index 0 and 1 is split among L1i and L1d. I added test for L1 and L2 to be shared among SMT siblings and L3 to be among all CPUs in Add cache topology verification to guest_cpu_topo_test |
When SMT is enabled, L1/L2 caches should report being shared by 2 logical processors (the SMT siblings). Previously EAX[25:14] was always being set to 0, indicating no sharing which contradicts the SMT topology reported in leaf
0xB. As per [1] EAX[25:14] indicates maximum number of addressable IDs for logical processors sharing this cache.This mismatch causes linux guest to print "BUG: arch topology borken / the SMT domain not a subset of the CLS domain" during boot. Linux derives L2 cache sharing groups from leaf 4 and expects SMT siblings to share L2 but it was being informed that each vCPU has private L1/L2.
This brings the SMT handling logic in CPUID inline with whats being done for AMD in
fix_amd_cache_topo()which sets the sharing count to 2 when has_smt is true. This fixes #1001.[1]: Table 1-15. Reference for CPUID Leaf 04H
https://cdrdv2-public.intel.com/775917/intel-64-architecture-processor-topology-enumeration.pdf