Skip to content

Commit ef688f8

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini: "The first batch of KVM patches, mostly covering x86. ARM: - Account stage2 page table allocations in memory stats x86: - Account EPT/NPT arm64 page table allocations in memory stats - Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR accesses - Drop eVMCS controls filtering for KVM on Hyper-V, all known versions of Hyper-V now support eVMCS fields associated with features that are enumerated to the guest - Use KVM's sanitized VMCS config as the basis for the values of nested VMX capabilities MSRs - A myriad event/exception fixes and cleanups. Most notably, pending exceptions morph into VM-Exits earlier, as soon as the exception is queued, instead of waiting until the next vmentry. This fixed a longstanding issue where the exceptions would incorrecly become double-faults instead of triggering a vmexit; the common case of page-fault vmexits had a special workaround, but now it's fixed for good - A handful of fixes for memory leaks in error paths - Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow - Never write to memory from non-sleepable kvm_vcpu_check_block() - Selftests refinements and cleanups - Misc typo cleanups Generic: - remove KVM_REQ_UNHALT" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits) KVM: remove KVM_REQ_UNHALT KVM: mips, x86: do not rely on KVM_REQ_UNHALT KVM: x86: never write to memory from kvm_vcpu_check_block() KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set KVM: x86: lapic does not have to process INIT if it is blocked KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed KVM: nVMX: Make an event request when pending an MTF nested VM-Exit KVM: x86: make vendor code check for all nested events mailmap: Update Oliver's email address KVM: x86: Allow force_emulation_prefix to be written without a reload KVM: selftests: Add an x86-only test to verify nested exception queueing KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events() KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions KVM: x86: Morph pending exceptions to pending VM-Exits at queue time ...
2 parents 0e47076 + c59fb12 commit ef688f8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+1871
-1077
lines changed

.mailmap

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,7 @@ Oleksij Rempel <[email protected]> <[email protected]>
336336
337337
338338
339+
339340
340341
Paolo 'Blaisorblade' Giarrusso <[email protected]>
341342
Patrick Mochel <[email protected]>

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1355,6 +1355,11 @@ PAGE_SIZE multiple when read back.
13551355
pagetables
13561356
Amount of memory allocated for page tables.
13571357

1358+
sec_pagetables
1359+
Amount of memory allocated for secondary page tables,
1360+
this currently includes KVM mmu allocations on x86
1361+
and arm64.
1362+
13581363
percpu (npn)
13591364
Amount of memory used for storing per-cpu kernel
13601365
data structures.

Documentation/filesystems/proc.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -982,6 +982,7 @@ Example output. You may not have all of these fields.
982982
SUnreclaim: 142336 kB
983983
KernelStack: 11168 kB
984984
PageTables: 20540 kB
985+
SecPageTables: 0 kB
985986
NFS_Unstable: 0 kB
986987
Bounce: 0 kB
987988
WritebackTmp: 0 kB
@@ -1090,6 +1091,9 @@ KernelStack
10901091
Memory consumed by the kernel stacks of all tasks
10911092
PageTables
10921093
Memory consumed by userspace page tables
1094+
SecPageTables
1095+
Memory consumed by secondary page tables, this currently
1096+
currently includes KVM mmu allocations on x86 and arm64.
10931097
NFS_Unstable
10941098
Always zero. Previous counted pages which had been written to
10951099
the server, but has not been committed to stable storage.

Documentation/virt/kvm/api.rst

Lines changed: 6 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -4074,7 +4074,7 @@ Queues an SMI on the thread's vcpu.
40744074
4.97 KVM_X86_SET_MSR_FILTER
40754075
----------------------------
40764076

4077-
:Capability: KVM_X86_SET_MSR_FILTER
4077+
:Capability: KVM_CAP_X86_MSR_FILTER
40784078
:Architectures: x86
40794079
:Type: vm ioctl
40804080
:Parameters: struct kvm_msr_filter
@@ -4173,8 +4173,10 @@ If an MSR access is not permitted through the filtering, it generates a
41734173
allows user space to deflect and potentially handle various MSR accesses
41744174
into user space.
41754175

4176-
If a vCPU is in running state while this ioctl is invoked, the vCPU may
4177-
experience inconsistent filtering behavior on MSR accesses.
4176+
Note, invoking this ioctl while a vCPU is running is inherently racy. However,
4177+
KVM does guarantee that vCPUs will see either the previous filter or the new
4178+
filter, e.g. MSRs with identical settings in both the old and new filter will
4179+
have deterministic behavior.
41784180

41794181
4.98 KVM_CREATE_SPAPR_TCE_64
41804182
----------------------------
@@ -5287,110 +5289,7 @@ KVM_PV_DUMP
52875289
authentication tag all of which are needed to decrypt the dump at a
52885290
later time.
52895291

5290-
5291-
4.126 KVM_X86_SET_MSR_FILTER
5292-
----------------------------
5293-
5294-
:Capability: KVM_CAP_X86_MSR_FILTER
5295-
:Architectures: x86
5296-
:Type: vm ioctl
5297-
:Parameters: struct kvm_msr_filter
5298-
:Returns: 0 on success, < 0 on error
5299-
5300-
::
5301-
5302-
struct kvm_msr_filter_range {
5303-
#define KVM_MSR_FILTER_READ (1 << 0)
5304-
#define KVM_MSR_FILTER_WRITE (1 << 1)
5305-
__u32 flags;
5306-
__u32 nmsrs; /* number of msrs in bitmap */
5307-
__u32 base; /* MSR index the bitmap starts at */
5308-
__u8 *bitmap; /* a 1 bit allows the operations in flags, 0 denies */
5309-
};
5310-
5311-
#define KVM_MSR_FILTER_MAX_RANGES 16
5312-
struct kvm_msr_filter {
5313-
#define KVM_MSR_FILTER_DEFAULT_ALLOW (0 << 0)
5314-
#define KVM_MSR_FILTER_DEFAULT_DENY (1 << 0)
5315-
__u32 flags;
5316-
struct kvm_msr_filter_range ranges[KVM_MSR_FILTER_MAX_RANGES];
5317-
};
5318-
5319-
flags values for ``struct kvm_msr_filter_range``:
5320-
5321-
``KVM_MSR_FILTER_READ``
5322-
5323-
Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap
5324-
indicates that a read should immediately fail, while a 1 indicates that
5325-
a read for a particular MSR should be handled regardless of the default
5326-
filter action.
5327-
5328-
``KVM_MSR_FILTER_WRITE``
5329-
5330-
Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap
5331-
indicates that a write should immediately fail, while a 1 indicates that
5332-
a write for a particular MSR should be handled regardless of the default
5333-
filter action.
5334-
5335-
``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE``
5336-
5337-
Filter both read and write accesses to MSRs using the given bitmap. A 0
5338-
in the bitmap indicates that both reads and writes should immediately fail,
5339-
while a 1 indicates that reads and writes for a particular MSR are not
5340-
filtered by this range.
5341-
5342-
flags values for ``struct kvm_msr_filter``:
5343-
5344-
``KVM_MSR_FILTER_DEFAULT_ALLOW``
5345-
5346-
If no filter range matches an MSR index that is getting accessed, KVM will
5347-
fall back to allowing access to the MSR.
5348-
5349-
``KVM_MSR_FILTER_DEFAULT_DENY``
5350-
5351-
If no filter range matches an MSR index that is getting accessed, KVM will
5352-
fall back to rejecting access to the MSR. In this mode, all MSRs that should
5353-
be processed by KVM need to explicitly be marked as allowed in the bitmaps.
5354-
5355-
This ioctl allows user space to define up to 16 bitmaps of MSR ranges to
5356-
specify whether a certain MSR access should be explicitly filtered for or not.
5357-
5358-
If this ioctl has never been invoked, MSR accesses are not guarded and the
5359-
default KVM in-kernel emulation behavior is fully preserved.
5360-
5361-
Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR
5362-
filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes
5363-
an error.
5364-
5365-
As soon as the filtering is in place, every MSR access is processed through
5366-
the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff);
5367-
x2APIC MSRs are always allowed, independent of the ``default_allow`` setting,
5368-
and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base
5369-
register.
5370-
5371-
If a bit is within one of the defined ranges, read and write accesses are
5372-
guarded by the bitmap's value for the MSR index if the kind of access
5373-
is included in the ``struct kvm_msr_filter_range`` flags. If no range
5374-
cover this particular access, the behavior is determined by the flags
5375-
field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW``
5376-
and ``KVM_MSR_FILTER_DEFAULT_DENY``.
5377-
5378-
Each bitmap range specifies a range of MSRs to potentially allow access on.
5379-
The range goes from MSR index [base .. base+nmsrs]. The flags field
5380-
indicates whether reads, writes or both reads and writes are filtered
5381-
by setting a 1 bit in the bitmap for the corresponding MSR index.
5382-
5383-
If an MSR access is not permitted through the filtering, it generates a
5384-
#GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that
5385-
allows user space to deflect and potentially handle various MSR accesses
5386-
into user space.
5387-
5388-
Note, invoking this ioctl with a vCPU is running is inherently racy. However,
5389-
KVM does guarantee that vCPUs will see either the previous filter or the new
5390-
filter, e.g. MSRs with identical settings in both the old and new filter will
5391-
have deterministic behavior.
5392-
5393-
4.127 KVM_XEN_HVM_SET_ATTR
5292+
4.126 KVM_XEN_HVM_SET_ATTR
53945293
--------------------------
53955294

53965295
:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO

Documentation/virt/kvm/vcpu-requests.rst

Lines changed: 1 addition & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
9797
This means general bitops, like those documented in [atomic-ops]_ could
9898
also be used, e.g. ::
9999

100-
clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
100+
clear_bit(KVM_REQ_UNBLOCK & KVM_REQUEST_MASK, &vcpu->requests);
101101

102102
However, VCPU request users should refrain from doing so, as it would
103103
break the abstraction. The first 8 bits are reserved for architecture
@@ -126,17 +126,6 @@ KVM_REQ_UNBLOCK
126126
or in order to update the interrupt routing and ensure that assigned
127127
devices will wake up the vCPU.
128128

129-
KVM_REQ_UNHALT
130-
131-
This request may be made from the KVM common function kvm_vcpu_block(),
132-
which is used to emulate an instruction that causes a CPU to halt until
133-
one of an architectural specific set of events and/or interrupts is
134-
received (determined by checking kvm_arch_vcpu_runnable()). When that
135-
event or interrupt arrives kvm_vcpu_block() makes the request. This is
136-
in contrast to when kvm_vcpu_block() returns due to any other reason,
137-
such as a pending signal, which does not indicate the VCPU's halt
138-
emulation should stop, and therefore does not make the request.
139-
140129
KVM_REQ_OUTSIDE_GUEST_MODE
141130

142131
This "request" ensures the target vCPU has exited guest mode prior to the
@@ -297,21 +286,6 @@ architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
297286
to check if it should awaken. One reason to do so is to provide
298287
architectures a function where requests may be checked if necessary.
299288

300-
Clearing Requests
301-
-----------------
302-
303-
Generally it only makes sense for the receiving VCPU thread to clear a
304-
request. However, in some circumstances, such as when the requesting
305-
thread and the receiving VCPU thread are executed serially, such as when
306-
they are the same thread, or when they are using some form of concurrency
307-
control to temporarily execute synchronously, then it's possible to know
308-
that the request may be cleared immediately, rather than waiting for the
309-
receiving VCPU thread to handle the request in VCPU RUN. The only current
310-
examples of this are kvm_vcpu_block() calls made by VCPUs to block
311-
themselves. A possible side-effect of that call is to make the
312-
KVM_REQ_UNHALT request, which may then be cleared immediately when the
313-
VCPU returns from the call.
314-
315289
References
316290
==========
317291

arch/arm64/kvm/arm.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -666,7 +666,6 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
666666

667667
kvm_vcpu_halt(vcpu);
668668
vcpu_clear_flag(vcpu, IN_WFIT);
669-
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
670669

671670
preempt_disable();
672671
vgic_v4_load(vcpu);

arch/arm64/kvm/mmu.c

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -92,16 +92,35 @@ static bool kvm_is_device_pfn(unsigned long pfn)
9292
static void *stage2_memcache_zalloc_page(void *arg)
9393
{
9494
struct kvm_mmu_memory_cache *mc = arg;
95+
void *virt;
9596

9697
/* Allocated with __GFP_ZERO, so no need to zero */
97-
return kvm_mmu_memory_cache_alloc(mc);
98+
virt = kvm_mmu_memory_cache_alloc(mc);
99+
if (virt)
100+
kvm_account_pgtable_pages(virt, 1);
101+
return virt;
98102
}
99103

100104
static void *kvm_host_zalloc_pages_exact(size_t size)
101105
{
102106
return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
103107
}
104108

109+
static void *kvm_s2_zalloc_pages_exact(size_t size)
110+
{
111+
void *virt = kvm_host_zalloc_pages_exact(size);
112+
113+
if (virt)
114+
kvm_account_pgtable_pages(virt, (size >> PAGE_SHIFT));
115+
return virt;
116+
}
117+
118+
static void kvm_s2_free_pages_exact(void *virt, size_t size)
119+
{
120+
kvm_account_pgtable_pages(virt, -(size >> PAGE_SHIFT));
121+
free_pages_exact(virt, size);
122+
}
123+
105124
static void kvm_host_get_page(void *addr)
106125
{
107126
get_page(virt_to_page(addr));
@@ -112,6 +131,15 @@ static void kvm_host_put_page(void *addr)
112131
put_page(virt_to_page(addr));
113132
}
114133

134+
static void kvm_s2_put_page(void *addr)
135+
{
136+
struct page *p = virt_to_page(addr);
137+
/* Dropping last refcount, the page will be freed */
138+
if (page_count(p) == 1)
139+
kvm_account_pgtable_pages(addr, -1);
140+
put_page(p);
141+
}
142+
115143
static int kvm_host_page_count(void *addr)
116144
{
117145
return page_count(virt_to_page(addr));
@@ -625,10 +653,10 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
625653

626654
static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
627655
.zalloc_page = stage2_memcache_zalloc_page,
628-
.zalloc_pages_exact = kvm_host_zalloc_pages_exact,
629-
.free_pages_exact = free_pages_exact,
656+
.zalloc_pages_exact = kvm_s2_zalloc_pages_exact,
657+
.free_pages_exact = kvm_s2_free_pages_exact,
630658
.get_page = kvm_host_get_page,
631-
.put_page = kvm_host_put_page,
659+
.put_page = kvm_s2_put_page,
632660
.page_count = kvm_host_page_count,
633661
.phys_to_virt = kvm_host_va,
634662
.virt_to_phys = kvm_host_pa,

arch/mips/kvm/emulate.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -955,13 +955,11 @@ enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu *vcpu)
955955
kvm_vcpu_halt(vcpu);
956956

957957
/*
958-
* We we are runnable, then definitely go off to user space to
958+
* We are runnable, then definitely go off to user space to
959959
* check if any I/O interrupts are pending.
960960
*/
961-
if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
962-
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
961+
if (kvm_arch_vcpu_runnable(vcpu))
963962
vcpu->run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
964-
}
965963
}
966964

967965
return EMULATE_DONE;

arch/powerpc/kvm/book3s_pr.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -499,7 +499,6 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
499499
if (msr & MSR_POW) {
500500
if (!vcpu->arch.pending_exceptions) {
501501
kvm_vcpu_halt(vcpu);
502-
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
503502
vcpu->stat.generic.halt_wakeup++;
504503

505504
/* Unset POW bit after we woke up */

arch/powerpc/kvm/book3s_pr_papr.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -393,7 +393,6 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
393393
case H_CEDE:
394394
kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
395395
kvm_vcpu_halt(vcpu);
396-
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
397396
vcpu->stat.generic.halt_wakeup++;
398397
return EMULATE_DONE;
399398
case H_LOGICAL_CI_LOAD:

0 commit comments

Comments
 (0)