Skip to content

Conversation

@cxdong
Copy link
Contributor

@cxdong cxdong commented Dec 9, 2025

pvvmcs part2 to implement the PV interfaces of accessing the vcpu state

@cxdong cxdong force-pushed the pkvm-v6.18 branch 2 times, most recently from c1f0cd8 to 0f4b8fe Compare December 16, 2025 11:12
@cxdong cxdong force-pushed the pkvm-v6.18-pvvmcs-part2 branch from b504641 to 8b26289 Compare December 16, 2025 14:52
@cxdong cxdong force-pushed the pkvm-v6.18-pvvmcs-part2 branch 2 times, most recently from df60a09 to 222764d Compare December 19, 2025 06:28
cxdong and others added 21 commits January 17, 2026 20:24
Initialize KVM cpuid_xstate_sizes, which will be needed by the pKVM to
leverage KVM CPUID codes to emulate guest VM's CPUID.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Import fpu_enable_guest_xfd_features() from kernel fpu for pKVM to
enable the guest xfd features. The __xfd_enable_feature() is implemented
differently with kernel as pKVM doesn't support allocating memory
dynamically thus the fpstate memory for xfd feature is required to be
allocated and donated from the host via a dedicated PV interface. Thus
the xfd feature enabling in pKVM is to check if the fpstate memory size
is sufficiently large for enabling xfd.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Import kvm_set_cpuid from KVM, with changes to remove unsupported code
path, e.g, virtual APIC (emulated by the host) and kvm mmu will be
handled by the host, PMU refresh is not needed as PMU is not supported.
Hyper-V related code is bypassed as it is also not supported.

As pVM's FPU should be isolated from the host, pKVM will enable its xfd
feature if cpuid allows. For npVM, the xfd feature enabling is done by
the host rather than pKVM.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
The pKVM hypervisor doesn't support emulate KVM PV features for
simplicity. Thus remove KVM PV feature bits from the corresponding
CPUID.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Add vcpu_after_set_cpuid PV interface for the host to send the CPUID
entries to the pKVM hypervisor, which will be used as the CPUID entries
for the pkvm_vcpu. The CPUID entries memory pages will be donated to
prevent the host from accessing. If the pkvm_vcpu already has a CPUID
entries, the old one's physical addresses and size will be return to the
host via pkvm_memcache.

For a pVM, this PV interface allows the host to set CPUID entries before
the pVM has started running, but doesn't allow the host to set if the
pVM is already running.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Enable the pKVM VMX host to send CPUID entries to the pKVM hypervisor
via the corresponding PV interface. If the PV interface is success, the
previous CPUID entries will be reclaimed if there is. As this PV
interface is inaccessible to the host once the guest state is protected,
no need to send CPUID entries in this case.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Do it before kvm_set_cpuid() for protected VM. Accordingly copy more
cpuid helper functions from kvm and fix dependencies on following
symbols:
  - kvm_pmu_cap
  - kvm_mmu_get_max_tdp_level()
  - xstate_get_guest_group_perm()

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Currently cpuid is set by the host, and immutable once pVM runs. But
certain cpuid leaves define security features required by pVM to ensure
its security inside. Lacking of them may put pVM under risk.

Adopt a simple policy similar to QEMU '--cpu host', by using the default
pKVM supported leaves as the base plus a small set allowing the host to
control. The allowed set is defined based on the leaves which crosvm
currently touches, and scrutinized to not have a security implication
to pVM.

This way saves us from the burden of maintaining a fine-grained bit-wise
complex policy as TDX does.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Add vcpu_add_fpstate PV interface for the host to add a new fpstate
memory if needs to support xfd, as the fpstate memory allocated when
creating vcpu may not be sufficient. As the pVM's FPU state is managed
by the pKVM hypervisor while the npVM's FPU state is managed by the
host, re-allocating the fpstate is only necessary for the pVM. This PV
interface is not accessible to the host once the pVM started running to
ensure the host won't manipulate pVM's FPU. The old fpstate memory
physical addresses and size are returned via pkvm_memcache for the host
to reclaim.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Add new fpstate memory via the vcpu_add_fpstate PV interface if the xfd
feature is enabled for pVM. As the pVM's FPU state is managed by the
pKVM hypervisor while the npVM's FPU state is managed by the host,
re-allocating is only necessary for the pVM, and should be done before
adding the new cpuid entries to the pKVM hypervisor.

If the new fpstate memory is added successfully, free the old one.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Import vmx_write_tsc_offset/multiplier() from KVM to implement
write_tsc_offset/multiplier operations for pKVM.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Add write_tsc_offset/multiplier PV interfaces to allow host to write the
tsc offset and tsc multiplier. These PV interfaces are always accessible
to the host as the Trusty pVM can use GSC as the secure time source not
the TSC. The same reason to allow host to emulate and access the guest
TSC_ADJUST and TSC MSRs.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Enable the pKVM VMX host to write TSC offset and multiplier via the
corresponding PV interfaces.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Import vmx_load_mmu_pgd from KVM to implement load_mmu_pgd operation for
pKVM. The Hyper-V related handling is skipped as this is not supported
by pKVM.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Add load_mmu_pgd PV interface for the host to load the guest mmu via the
vendor-specific operation. Currently this PV interface is always
accessible to the host, as the guest mmu for both pVM and npVM are still
managed by the host. It will be updated accordingly once the pvMMU is
ready to manage the guest mmu inside the pKVM to achieve pVM's memory
isolation.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Enable the pKVM VMX host to load guest mmu via the load_mmu_pgd PV
interface.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Nested virtualization is not supported on this platform. To satisfy the
interface requirements, provide stub implementations for nested-related
operations.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Import vmx_setup_mce operation from the KVM to implement setup_mce
operation for pKVM.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Add setup_mce PV interface for the host to setup the guest MCE. This PV
interface is always accessible to the host as the input mcg_cap from the
host is checked and the guest MCE is initialized by the pKVM itself. No
guest MCE state will be leaked.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Enable the pKVM VMX host to setup guest MCE via the setup_mce PV
interface.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
The pKVM hypervisor doesn't support SMM mode emulation. To satisfy the
interface requirement, provide stub implementation for SMM mode
operations.

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
@cxdong cxdong force-pushed the pkvm-v6.18-pvvmcs-part2 branch 2 times, most recently from 3a83e13 to 4d6a20f Compare January 17, 2026 12:49
@cxdong
Copy link
Contributor Author

cxdong commented Jan 18, 2026

Hi @maluka-dmytro @jaszczyk-grzegorz , please let me know if we need to continue the review, or the review is done and we can merge it.

@maluka-dmytro
Copy link
Contributor

IMHO we can merge it.

@jaszczyk-grzegorz
Copy link

I've reviewed ~half of it and so far LGTM. If you have VMCS part3 ready for review please merge this one and lets move forward. If I will find something later on I will let you know and we can prepare squasme patch (but hopefully not).

@cxdong
Copy link
Contributor Author

cxdong commented Jan 19, 2026

I've reviewed ~half of it and so far LGTM. If you have VMCS part3 ready for review please merge this one and lets move forward. If I will find something later on I will let you know and we can prepare squasme patch (but hopefully not).

Makes sense. I will merge this PR and submit the last part of pvvmcs.

@cxdong cxdong merged commit 4d6a20f into intel-staging:pkvm-v6.18 Jan 19, 2026
static void pkvm_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
{
union pkvm_hc_data data = {
.set_gdt.desc = *dt,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/set_gdt/set_idt/?

@cxdong cxdong mentioned this pull request Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet