Skip to content

Commit d7e0a79

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini: "ARM: - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us RISC-V: - New KVM port. x86: - New API to control TSC offset from userspace - TSC scaling for nested hypervisors on SVM - Switch masterclock protection from raw_spin_lock to seqcount - Clean up function prototypes in the page fault code and avoid repeated memslot lookups - Convey the exit reason to userspace on emulation failure - Configure time between NX page recovery iterations - Expose Predictive Store Forwarding Disable CPUID leaf - Allocate page tracking data structures lazily (if the i915 KVM-GT functionality is not compiled in) - Cleanups, fixes and optimizations for the shadow MMU code s390: - SIGP Fixes - initial preparations for lazy destroy of secure VMs - storage key improvements/fixes - Log the guest CPNC Starting from this release, KVM-PPC patches will come from Michael Ellerman's PPC tree" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) RISC-V: KVM: fix boolreturn.cocci warnings RISC-V: KVM: remove unneeded semicolon RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions RISC-V: KVM: Factor-out FP virtualization into separate sources KVM: s390: add debug statement for diag 318 CPNC data KVM: s390: pv: properly handle page flags for protected guests KVM: s390: Fix handle_sske page fault handling KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol KVM: x86: On emulation failure, convey the exit reason, etc. to userspace KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info KVM: x86: Clarify the kvm_run.emulation_failure structure layout KVM: s390: Add a routine for setting userspace CPU state KVM: s390: Simplify SIGP Set Arch handling KVM: s390: pv: avoid stalls when making pages secure KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm KVM: s390: pv: avoid double free of sida page KVM: s390: pv: add macros for UVC CC values s390/mm: optimize reset_guest_reference_bit() s390/mm: optimize set_guest_storage_key() s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present ...
2 parents 44261f8 + 52cf891 commit d7e0a79

File tree

152 files changed

+11646
-1752
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

152 files changed

+11646
-1752
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2353,7 +2353,14 @@
23532353
[KVM] Controls how many 4KiB pages are periodically zapped
23542354
back to huge pages. 0 disables the recovery, otherwise if
23552355
the value is N KVM will zap 1/Nth of the 4KiB pages every
2356-
minute. The default is 60.
2356+
period (see below). The default is 60.
2357+
2358+
kvm.nx_huge_pages_recovery_period_ms=
2359+
[KVM] Controls the time period at which KVM zaps 4KiB pages
2360+
back to huge pages. If the value is a non-zero N, KVM will
2361+
zap a portion (see ratio above) of the pages every N msecs.
2362+
If the value is 0 (the default), KVM will pick a period based
2363+
on the ratio, such that a page is zapped after 1 hour on average.
23572364

23582365
kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
23592366
Default is 1 (enabled)
@@ -2365,14 +2372,18 @@
23652372
kvm-arm.mode=
23662373
[KVM,ARM] Select one of KVM/arm64's modes of operation.
23672374

2375+
none: Forcefully disable KVM.
2376+
23682377
nvhe: Standard nVHE-based mode, without support for
23692378
protected guests.
23702379

23712380
protected: nVHE-based mode with support for guests whose
23722381
state is kept private from the host.
23732382
Not valid if the kernel is running in EL2.
23742383

2375-
Defaults to VHE/nVHE based on hardware support.
2384+
Defaults to VHE/nVHE based on hardware support. Setting
2385+
mode to "protected" will disable kexec and hibernation
2386+
for the host.
23762387

23772388
kvm-arm.vgic_v3_group0_trap=
23782389
[KVM,ARM] Trap guest accesses to GICv3 group-0

Documentation/virt/kvm/api.rst

Lines changed: 223 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -532,7 +532,7 @@ translation mode.
532532
------------------
533533

534534
:Capability: basic
535-
:Architectures: x86, ppc, mips
535+
:Architectures: x86, ppc, mips, riscv
536536
:Type: vcpu ioctl
537537
:Parameters: struct kvm_interrupt (in)
538538
:Returns: 0 on success, negative on failure.
@@ -601,6 +601,23 @@ interrupt number dequeues the interrupt.
601601

602602
This is an asynchronous vcpu ioctl and can be invoked from any thread.
603603

604+
RISC-V:
605+
^^^^^^^
606+
607+
Queues an external interrupt to be injected into the virutal CPU. This ioctl
608+
is overloaded with 2 different irq values:
609+
610+
a) KVM_INTERRUPT_SET
611+
612+
This sets external interrupt for a virtual CPU and it will receive
613+
once it is ready.
614+
615+
b) KVM_INTERRUPT_UNSET
616+
617+
This clears pending external interrupt for a virtual CPU.
618+
619+
This is an asynchronous vcpu ioctl and can be invoked from any thread.
620+
604621

605622
4.17 KVM_DEBUG_GUEST
606623
--------------------
@@ -993,20 +1010,37 @@ such as migration.
9931010
When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
9941011
set of bits that KVM can return in struct kvm_clock_data's flag member.
9951012

996-
The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned
997-
value is the exact kvmclock value seen by all VCPUs at the instant
998-
when KVM_GET_CLOCK was called. If clear, the returned value is simply
999-
CLOCK_MONOTONIC plus a constant offset; the offset can be modified
1000-
with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock,
1001-
but the exact value read by each VCPU could differ, because the host
1002-
TSC is not stable.
1013+
The following flags are defined:
1014+
1015+
KVM_CLOCK_TSC_STABLE
1016+
If set, the returned value is the exact kvmclock
1017+
value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
1018+
If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
1019+
offset; the offset can be modified with KVM_SET_CLOCK. KVM will try
1020+
to make all VCPUs follow this clock, but the exact value read by each
1021+
VCPU could differ, because the host TSC is not stable.
1022+
1023+
KVM_CLOCK_REALTIME
1024+
If set, the `realtime` field in the kvm_clock_data
1025+
structure is populated with the value of the host's real time
1026+
clocksource at the instant when KVM_GET_CLOCK was called. If clear,
1027+
the `realtime` field does not contain a value.
1028+
1029+
KVM_CLOCK_HOST_TSC
1030+
If set, the `host_tsc` field in the kvm_clock_data
1031+
structure is populated with the value of the host's timestamp counter (TSC)
1032+
at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
1033+
does not contain a value.
10031034

10041035
::
10051036

10061037
struct kvm_clock_data {
10071038
__u64 clock; /* kvmclock current value */
10081039
__u32 flags;
1009-
__u32 pad[9];
1040+
__u32 pad0;
1041+
__u64 realtime;
1042+
__u64 host_tsc;
1043+
__u32 pad[4];
10101044
};
10111045

10121046

@@ -1023,12 +1057,25 @@ Sets the current timestamp of kvmclock to the value specified in its parameter.
10231057
In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
10241058
such as migration.
10251059

1060+
The following flags can be passed:
1061+
1062+
KVM_CLOCK_REALTIME
1063+
If set, KVM will compare the value of the `realtime` field
1064+
with the value of the host's real time clocksource at the instant when
1065+
KVM_SET_CLOCK was called. The difference in elapsed time is added to the final
1066+
kvmclock value that will be provided to guests.
1067+
1068+
Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored.
1069+
10261070
::
10271071

10281072
struct kvm_clock_data {
10291073
__u64 clock; /* kvmclock current value */
10301074
__u32 flags;
1031-
__u32 pad[9];
1075+
__u32 pad0;
1076+
__u64 realtime;
1077+
__u64 host_tsc;
1078+
__u32 pad[4];
10321079
};
10331080

10341081

@@ -1399,7 +1446,7 @@ for vm-wide capabilities.
13991446
---------------------
14001447

14011448
:Capability: KVM_CAP_MP_STATE
1402-
:Architectures: x86, s390, arm, arm64
1449+
:Architectures: x86, s390, arm, arm64, riscv
14031450
:Type: vcpu ioctl
14041451
:Parameters: struct kvm_mp_state (out)
14051452
:Returns: 0 on success; -1 on error
@@ -1416,7 +1463,8 @@ uniprocessor guests).
14161463
Possible values are:
14171464

14181465
========================== ===============================================
1419-
KVM_MP_STATE_RUNNABLE the vcpu is currently running [x86,arm/arm64]
1466+
KVM_MP_STATE_RUNNABLE the vcpu is currently running
1467+
[x86,arm/arm64,riscv]
14201468
KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP)
14211469
which has not yet received an INIT signal [x86]
14221470
KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is
@@ -1425,7 +1473,7 @@ Possible values are:
14251473
is waiting for an interrupt [x86]
14261474
KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector
14271475
accessible via KVM_GET_VCPU_EVENTS) [x86]
1428-
KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64]
1476+
KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64,riscv]
14291477
KVM_MP_STATE_CHECK_STOP the vcpu is in a special error state [s390]
14301478
KVM_MP_STATE_OPERATING the vcpu is operating (running or halted)
14311479
[s390]
@@ -1437,8 +1485,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
14371485
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
14381486
these architectures.
14391487

1440-
For arm/arm64:
1441-
^^^^^^^^^^^^^^
1488+
For arm/arm64/riscv:
1489+
^^^^^^^^^^^^^^^^^^^^
14421490

14431491
The only states that are valid are KVM_MP_STATE_STOPPED and
14441492
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
@@ -1447,7 +1495,7 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
14471495
---------------------
14481496

14491497
:Capability: KVM_CAP_MP_STATE
1450-
:Architectures: x86, s390, arm, arm64
1498+
:Architectures: x86, s390, arm, arm64, riscv
14511499
:Type: vcpu ioctl
14521500
:Parameters: struct kvm_mp_state (in)
14531501
:Returns: 0 on success; -1 on error
@@ -1459,8 +1507,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
14591507
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
14601508
these architectures.
14611509

1462-
For arm/arm64:
1463-
^^^^^^^^^^^^^^
1510+
For arm/arm64/riscv:
1511+
^^^^^^^^^^^^^^^^^^^^
14641512

14651513
The only states that are valid are KVM_MP_STATE_STOPPED and
14661514
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
@@ -2577,6 +2625,144 @@ following id bit patterns::
25772625

25782626
0x7020 0000 0003 02 <0:3> <reg:5>
25792627

2628+
RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of
2629+
that is the register group type.
2630+
2631+
RISC-V config registers are meant for configuring a Guest VCPU and it has
2632+
the following id bit patterns::
2633+
2634+
0x8020 0000 01 <index into the kvm_riscv_config struct:24> (32bit Host)
2635+
0x8030 0000 01 <index into the kvm_riscv_config struct:24> (64bit Host)
2636+
2637+
Following are the RISC-V config registers:
2638+
2639+
======================= ========= =============================================
2640+
Encoding Register Description
2641+
======================= ========= =============================================
2642+
0x80x0 0000 0100 0000 isa ISA feature bitmap of Guest VCPU
2643+
======================= ========= =============================================
2644+
2645+
The isa config register can be read anytime but can only be written before
2646+
a Guest VCPU runs. It will have ISA feature bits matching underlying host
2647+
set by default.
2648+
2649+
RISC-V core registers represent the general excution state of a Guest VCPU
2650+
and it has the following id bit patterns::
2651+
2652+
0x8020 0000 02 <index into the kvm_riscv_core struct:24> (32bit Host)
2653+
0x8030 0000 02 <index into the kvm_riscv_core struct:24> (64bit Host)
2654+
2655+
Following are the RISC-V core registers:
2656+
2657+
======================= ========= =============================================
2658+
Encoding Register Description
2659+
======================= ========= =============================================
2660+
0x80x0 0000 0200 0000 regs.pc Program counter
2661+
0x80x0 0000 0200 0001 regs.ra Return address
2662+
0x80x0 0000 0200 0002 regs.sp Stack pointer
2663+
0x80x0 0000 0200 0003 regs.gp Global pointer
2664+
0x80x0 0000 0200 0004 regs.tp Task pointer
2665+
0x80x0 0000 0200 0005 regs.t0 Caller saved register 0
2666+
0x80x0 0000 0200 0006 regs.t1 Caller saved register 1
2667+
0x80x0 0000 0200 0007 regs.t2 Caller saved register 2
2668+
0x80x0 0000 0200 0008 regs.s0 Callee saved register 0
2669+
0x80x0 0000 0200 0009 regs.s1 Callee saved register 1
2670+
0x80x0 0000 0200 000a regs.a0 Function argument (or return value) 0
2671+
0x80x0 0000 0200 000b regs.a1 Function argument (or return value) 1
2672+
0x80x0 0000 0200 000c regs.a2 Function argument 2
2673+
0x80x0 0000 0200 000d regs.a3 Function argument 3
2674+
0x80x0 0000 0200 000e regs.a4 Function argument 4
2675+
0x80x0 0000 0200 000f regs.a5 Function argument 5
2676+
0x80x0 0000 0200 0010 regs.a6 Function argument 6
2677+
0x80x0 0000 0200 0011 regs.a7 Function argument 7
2678+
0x80x0 0000 0200 0012 regs.s2 Callee saved register 2
2679+
0x80x0 0000 0200 0013 regs.s3 Callee saved register 3
2680+
0x80x0 0000 0200 0014 regs.s4 Callee saved register 4
2681+
0x80x0 0000 0200 0015 regs.s5 Callee saved register 5
2682+
0x80x0 0000 0200 0016 regs.s6 Callee saved register 6
2683+
0x80x0 0000 0200 0017 regs.s7 Callee saved register 7
2684+
0x80x0 0000 0200 0018 regs.s8 Callee saved register 8
2685+
0x80x0 0000 0200 0019 regs.s9 Callee saved register 9
2686+
0x80x0 0000 0200 001a regs.s10 Callee saved register 10
2687+
0x80x0 0000 0200 001b regs.s11 Callee saved register 11
2688+
0x80x0 0000 0200 001c regs.t3 Caller saved register 3
2689+
0x80x0 0000 0200 001d regs.t4 Caller saved register 4
2690+
0x80x0 0000 0200 001e regs.t5 Caller saved register 5
2691+
0x80x0 0000 0200 001f regs.t6 Caller saved register 6
2692+
0x80x0 0000 0200 0020 mode Privilege mode (1 = S-mode or 0 = U-mode)
2693+
======================= ========= =============================================
2694+
2695+
RISC-V csr registers represent the supervisor mode control/status registers
2696+
of a Guest VCPU and it has the following id bit patterns::
2697+
2698+
0x8020 0000 03 <index into the kvm_riscv_csr struct:24> (32bit Host)
2699+
0x8030 0000 03 <index into the kvm_riscv_csr struct:24> (64bit Host)
2700+
2701+
Following are the RISC-V csr registers:
2702+
2703+
======================= ========= =============================================
2704+
Encoding Register Description
2705+
======================= ========= =============================================
2706+
0x80x0 0000 0300 0000 sstatus Supervisor status
2707+
0x80x0 0000 0300 0001 sie Supervisor interrupt enable
2708+
0x80x0 0000 0300 0002 stvec Supervisor trap vector base
2709+
0x80x0 0000 0300 0003 sscratch Supervisor scratch register
2710+
0x80x0 0000 0300 0004 sepc Supervisor exception program counter
2711+
0x80x0 0000 0300 0005 scause Supervisor trap cause
2712+
0x80x0 0000 0300 0006 stval Supervisor bad address or instruction
2713+
0x80x0 0000 0300 0007 sip Supervisor interrupt pending
2714+
0x80x0 0000 0300 0008 satp Supervisor address translation and protection
2715+
======================= ========= =============================================
2716+
2717+
RISC-V timer registers represent the timer state of a Guest VCPU and it has
2718+
the following id bit patterns::
2719+
2720+
0x8030 0000 04 <index into the kvm_riscv_timer struct:24>
2721+
2722+
Following are the RISC-V timer registers:
2723+
2724+
======================= ========= =============================================
2725+
Encoding Register Description
2726+
======================= ========= =============================================
2727+
0x8030 0000 0400 0000 frequency Time base frequency (read-only)
2728+
0x8030 0000 0400 0001 time Time value visible to Guest
2729+
0x8030 0000 0400 0002 compare Time compare programmed by Guest
2730+
0x8030 0000 0400 0003 state Time compare state (1 = ON or 0 = OFF)
2731+
======================= ========= =============================================
2732+
2733+
RISC-V F-extension registers represent the single precision floating point
2734+
state of a Guest VCPU and it has the following id bit patterns::
2735+
2736+
0x8020 0000 05 <index into the __riscv_f_ext_state struct:24>
2737+
2738+
Following are the RISC-V F-extension registers:
2739+
2740+
======================= ========= =============================================
2741+
Encoding Register Description
2742+
======================= ========= =============================================
2743+
0x8020 0000 0500 0000 f[0] Floating point register 0
2744+
...
2745+
0x8020 0000 0500 001f f[31] Floating point register 31
2746+
0x8020 0000 0500 0020 fcsr Floating point control and status register
2747+
======================= ========= =============================================
2748+
2749+
RISC-V D-extension registers represent the double precision floating point
2750+
state of a Guest VCPU and it has the following id bit patterns::
2751+
2752+
0x8020 0000 06 <index into the __riscv_d_ext_state struct:24> (fcsr)
2753+
0x8030 0000 06 <index into the __riscv_d_ext_state struct:24> (non-fcsr)
2754+
2755+
Following are the RISC-V D-extension registers:
2756+
2757+
======================= ========= =============================================
2758+
Encoding Register Description
2759+
======================= ========= =============================================
2760+
0x8030 0000 0600 0000 f[0] Floating point register 0
2761+
...
2762+
0x8030 0000 0600 001f f[31] Floating point register 31
2763+
0x8020 0000 0600 0020 fcsr Floating point control and status register
2764+
======================= ========= =============================================
2765+
25802766

25812767
4.69 KVM_GET_ONE_REG
25822768
--------------------
@@ -5848,6 +6034,25 @@ Valid values for 'type' are:
58486034
Userspace is expected to place the hypercall result into the appropriate
58496035
field before invoking KVM_RUN again.
58506036

6037+
::
6038+
6039+
/* KVM_EXIT_RISCV_SBI */
6040+
struct {
6041+
unsigned long extension_id;
6042+
unsigned long function_id;
6043+
unsigned long args[6];
6044+
unsigned long ret[2];
6045+
} riscv_sbi;
6046+
If exit reason is KVM_EXIT_RISCV_SBI then it indicates that the VCPU has
6047+
done a SBI call which is not handled by KVM RISC-V kernel module. The details
6048+
of the SBI call are available in 'riscv_sbi' member of kvm_run structure. The
6049+
'extension_id' field of 'riscv_sbi' represents SBI extension ID whereas the
6050+
'function_id' field represents function ID of given SBI extension. The 'args'
6051+
array field of 'riscv_sbi' represents parameters for the SBI call and 'ret'
6052+
array field represents return values. The userspace should update the return
6053+
values of SBI call before resuming the VCPU. For more details on RISC-V SBI
6054+
spec refer, https://github.com/riscv/riscv-sbi-doc.
6055+
58516056
::
58526057

58536058
/* Fix the size of the union. */

0 commit comments

Comments
 (0)