Skip to content

Commit 63eb28b

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
2 parents 7d767a9 + 196d9e7 commit 63eb28b

File tree

209 files changed

+12095
-4770
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

209 files changed

+12095
-4770
lines changed

Documentation/arch/arm64/booting.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,47 @@ Before jumping into the kernel, the following conditions must be met:
223223

224224
- SCR_EL3.HCE (bit 8) must be initialised to 0b1.
225225

226+
For systems with a GICv5 interrupt controller to be used in v5 mode:
227+
228+
- If the kernel is entered at EL1 and EL2 is present:
229+
230+
- ICH_HFGRTR_EL2.ICC_PPI_ACTIVERn_EL1 (bit 20) must be initialised to 0b1.
231+
- ICH_HFGRTR_EL2.ICC_PPI_PRIORITYRn_EL1 (bit 19) must be initialised to 0b1.
232+
- ICH_HFGRTR_EL2.ICC_PPI_PENDRn_EL1 (bit 18) must be initialised to 0b1.
233+
- ICH_HFGRTR_EL2.ICC_PPI_ENABLERn_EL1 (bit 17) must be initialised to 0b1.
234+
- ICH_HFGRTR_EL2.ICC_PPI_HMRn_EL1 (bit 16) must be initialised to 0b1.
235+
- ICH_HFGRTR_EL2.ICC_IAFFIDR_EL1 (bit 7) must be initialised to 0b1.
236+
- ICH_HFGRTR_EL2.ICC_ICSR_EL1 (bit 6) must be initialised to 0b1.
237+
- ICH_HFGRTR_EL2.ICC_PCR_EL1 (bit 5) must be initialised to 0b1.
238+
- ICH_HFGRTR_EL2.ICC_HPPIR_EL1 (bit 4) must be initialised to 0b1.
239+
- ICH_HFGRTR_EL2.ICC_HAPR_EL1 (bit 3) must be initialised to 0b1.
240+
- ICH_HFGRTR_EL2.ICC_CR0_EL1 (bit 2) must be initialised to 0b1.
241+
- ICH_HFGRTR_EL2.ICC_IDRn_EL1 (bit 1) must be initialised to 0b1.
242+
- ICH_HFGRTR_EL2.ICC_APR_EL1 (bit 0) must be initialised to 0b1.
243+
244+
- ICH_HFGWTR_EL2.ICC_PPI_ACTIVERn_EL1 (bit 20) must be initialised to 0b1.
245+
- ICH_HFGWTR_EL2.ICC_PPI_PRIORITYRn_EL1 (bit 19) must be initialised to 0b1.
246+
- ICH_HFGWTR_EL2.ICC_PPI_PENDRn_EL1 (bit 18) must be initialised to 0b1.
247+
- ICH_HFGWTR_EL2.ICC_PPI_ENABLERn_EL1 (bit 17) must be initialised to 0b1.
248+
- ICH_HFGWTR_EL2.ICC_ICSR_EL1 (bit 6) must be initialised to 0b1.
249+
- ICH_HFGWTR_EL2.ICC_PCR_EL1 (bit 5) must be initialised to 0b1.
250+
- ICH_HFGWTR_EL2.ICC_CR0_EL1 (bit 2) must be initialised to 0b1.
251+
- ICH_HFGWTR_EL2.ICC_APR_EL1 (bit 0) must be initialised to 0b1.
252+
253+
- ICH_HFGITR_EL2.GICRCDNMIA (bit 10) must be initialised to 0b1.
254+
- ICH_HFGITR_EL2.GICRCDIA (bit 9) must be initialised to 0b1.
255+
- ICH_HFGITR_EL2.GICCDDI (bit 8) must be initialised to 0b1.
256+
- ICH_HFGITR_EL2.GICCDEOI (bit 7) must be initialised to 0b1.
257+
- ICH_HFGITR_EL2.GICCDHM (bit 6) must be initialised to 0b1.
258+
- ICH_HFGITR_EL2.GICCDRCFG (bit 5) must be initialised to 0b1.
259+
- ICH_HFGITR_EL2.GICCDPEND (bit 4) must be initialised to 0b1.
260+
- ICH_HFGITR_EL2.GICCDAFF (bit 3) must be initialised to 0b1.
261+
- ICH_HFGITR_EL2.GICCDPRI (bit 2) must be initialised to 0b1.
262+
- ICH_HFGITR_EL2.GICCDDIS (bit 1) must be initialised to 0b1.
263+
- ICH_HFGITR_EL2.GICCDEN (bit 0) must be initialised to 0b1.
264+
265+
- The DT or ACPI tables must describe a GICv5 interrupt controller.
266+
226267
For systems with a GICv3 interrupt controller to be used in v3 mode:
227268
- If EL3 is present:
228269

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
2+
%YAML 1.2
3+
---
4+
$id: http://devicetree.org/schemas/interrupt-controller/arm,gic-v5-iwb.yaml#
5+
$schema: http://devicetree.org/meta-schemas/core.yaml#
6+
7+
title: ARM Generic Interrupt Controller, version 5 Interrupt Wire Bridge (IWB)
8+
9+
maintainers:
10+
- Lorenzo Pieralisi <[email protected]>
11+
- Marc Zyngier <[email protected]>
12+
13+
description: |
14+
The GICv5 architecture defines the guidelines to implement GICv5
15+
compliant interrupt controllers for AArch64 systems.
16+
17+
The GICv5 specification can be found at
18+
https://developer.arm.com/documentation/aes0070
19+
20+
GICv5 has zero or more Interrupt Wire Bridges (IWB) that are responsible
21+
for translating wire signals into interrupt messages to the GICv5 ITS.
22+
23+
allOf:
24+
- $ref: /schemas/interrupt-controller.yaml#
25+
26+
properties:
27+
compatible:
28+
const: arm,gic-v5-iwb
29+
30+
reg:
31+
items:
32+
- description: IWB control frame
33+
34+
"#address-cells":
35+
const: 0
36+
37+
"#interrupt-cells":
38+
description: |
39+
The 1st cell corresponds to the IWB wire.
40+
41+
The 2nd cell is the flags, encoded as follows:
42+
bits[3:0] trigger type and level flags.
43+
44+
1 = low-to-high edge triggered
45+
2 = high-to-low edge triggered
46+
4 = active high level-sensitive
47+
8 = active low level-sensitive
48+
49+
const: 2
50+
51+
interrupt-controller: true
52+
53+
msi-parent:
54+
maxItems: 1
55+
56+
required:
57+
- compatible
58+
- reg
59+
- "#interrupt-cells"
60+
- interrupt-controller
61+
- msi-parent
62+
63+
additionalProperties: false
64+
65+
examples:
66+
- |
67+
interrupt-controller@2f000000 {
68+
compatible = "arm,gic-v5-iwb";
69+
reg = <0x2f000000 0x10000>;
70+
71+
#address-cells = <0>;
72+
73+
#interrupt-cells = <2>;
74+
interrupt-controller;
75+
76+
msi-parent = <&its0 64>;
77+
};
78+
...
Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
2+
%YAML 1.2
3+
---
4+
$id: http://devicetree.org/schemas/interrupt-controller/arm,gic-v5.yaml#
5+
$schema: http://devicetree.org/meta-schemas/core.yaml#
6+
7+
title: ARM Generic Interrupt Controller, version 5
8+
9+
maintainers:
10+
- Lorenzo Pieralisi <[email protected]>
11+
- Marc Zyngier <[email protected]>
12+
13+
description: |
14+
The GICv5 architecture defines the guidelines to implement GICv5
15+
compliant interrupt controllers for AArch64 systems.
16+
17+
The GICv5 specification can be found at
18+
https://developer.arm.com/documentation/aes0070
19+
20+
The GICv5 architecture is composed of multiple components:
21+
- one or more IRS (Interrupt Routing Service)
22+
- zero or more ITS (Interrupt Translation Service)
23+
24+
The architecture defines:
25+
- PE-Private Peripheral Interrupts (PPI)
26+
- Shared Peripheral Interrupts (SPI)
27+
- Logical Peripheral Interrupts (LPI)
28+
29+
allOf:
30+
- $ref: /schemas/interrupt-controller.yaml#
31+
32+
properties:
33+
compatible:
34+
const: arm,gic-v5
35+
36+
"#address-cells":
37+
enum: [ 1, 2 ]
38+
39+
"#size-cells":
40+
enum: [ 1, 2 ]
41+
42+
ranges: true
43+
44+
"#interrupt-cells":
45+
description: |
46+
The 1st cell corresponds to the INTID.Type field in the INTID; 1 for PPI,
47+
3 for SPI. LPI interrupts must not be described in the bindings since
48+
they are allocated dynamically by the software component managing them.
49+
50+
The 2nd cell contains the interrupt INTID.ID field.
51+
52+
The 3rd cell is the flags, encoded as follows:
53+
bits[3:0] trigger type and level flags.
54+
55+
1 = low-to-high edge triggered
56+
2 = high-to-low edge triggered
57+
4 = active high level-sensitive
58+
8 = active low level-sensitive
59+
60+
const: 3
61+
62+
interrupt-controller: true
63+
64+
interrupts:
65+
description:
66+
The VGIC maintenance interrupt.
67+
maxItems: 1
68+
69+
required:
70+
- compatible
71+
- "#address-cells"
72+
- "#size-cells"
73+
- ranges
74+
- "#interrupt-cells"
75+
- interrupt-controller
76+
77+
patternProperties:
78+
"^irs@[0-9a-f]+$":
79+
type: object
80+
description:
81+
GICv5 has one or more Interrupt Routing Services (IRS) that are
82+
responsible for handling IRQ state and routing.
83+
84+
additionalProperties: false
85+
86+
properties:
87+
compatible:
88+
const: arm,gic-v5-irs
89+
90+
reg:
91+
minItems: 1
92+
items:
93+
- description: IRS config frames
94+
- description: IRS setlpi frames
95+
96+
reg-names:
97+
description:
98+
Describe config and setlpi frames that are present.
99+
"ns-" stands for non-secure, "s-" for secure, "realm-" for realm
100+
and "el3-" for EL3.
101+
minItems: 1
102+
maxItems: 8
103+
items:
104+
enum: [ ns-config, s-config, realm-config, el3-config, ns-setlpi,
105+
s-setlpi, realm-setlpi, el3-setlpi ]
106+
107+
"#address-cells":
108+
enum: [ 1, 2 ]
109+
110+
"#size-cells":
111+
enum: [ 1, 2 ]
112+
113+
ranges: true
114+
115+
dma-noncoherent:
116+
description:
117+
Present if the GIC IRS permits programming shareability and
118+
cacheability attributes but is connected to a non-coherent
119+
downstream interconnect.
120+
121+
cpus:
122+
description:
123+
CPUs managed by the IRS.
124+
125+
arm,iaffids:
126+
$ref: /schemas/types.yaml#/definitions/uint16-array
127+
description:
128+
Interrupt AFFinity ID (IAFFID) associated with the CPU whose
129+
CPU node phandle is at the same index in the cpus array.
130+
131+
patternProperties:
132+
"^its@[0-9a-f]+$":
133+
type: object
134+
description:
135+
GICv5 has zero or more Interrupt Translation Services (ITS) that are
136+
used to route Message Signalled Interrupts (MSI) to the CPUs. Each
137+
ITS is connected to an IRS.
138+
additionalProperties: false
139+
140+
properties:
141+
compatible:
142+
const: arm,gic-v5-its
143+
144+
reg:
145+
items:
146+
- description: ITS config frames
147+
148+
reg-names:
149+
description:
150+
Describe config frames that are present.
151+
"ns-" stands for non-secure, "s-" for secure, "realm-" for realm
152+
and "el3-" for EL3.
153+
minItems: 1
154+
maxItems: 4
155+
items:
156+
enum: [ ns-config, s-config, realm-config, el3-config ]
157+
158+
"#address-cells":
159+
enum: [ 1, 2 ]
160+
161+
"#size-cells":
162+
enum: [ 1, 2 ]
163+
164+
ranges: true
165+
166+
dma-noncoherent:
167+
description:
168+
Present if the GIC ITS permits programming shareability and
169+
cacheability attributes but is connected to a non-coherent
170+
downstream interconnect.
171+
172+
patternProperties:
173+
"^msi-controller@[0-9a-f]+$":
174+
type: object
175+
description:
176+
GICv5 ITS has one or more translate register frames.
177+
additionalProperties: false
178+
179+
properties:
180+
reg:
181+
items:
182+
- description: ITS translate frames
183+
184+
reg-names:
185+
description:
186+
Describe translate frames that are present.
187+
"ns-" stands for non-secure, "s-" for secure, "realm-" for realm
188+
and "el3-" for EL3.
189+
minItems: 1
190+
maxItems: 4
191+
items:
192+
enum: [ ns-translate, s-translate, realm-translate, el3-translate ]
193+
194+
"#msi-cells":
195+
description:
196+
The single msi-cell is the DeviceID of the device which will
197+
generate the MSI.
198+
const: 1
199+
200+
msi-controller: true
201+
202+
required:
203+
- reg
204+
- reg-names
205+
- "#msi-cells"
206+
- msi-controller
207+
208+
required:
209+
- compatible
210+
- reg
211+
- reg-names
212+
213+
required:
214+
- compatible
215+
- reg
216+
- reg-names
217+
- cpus
218+
- arm,iaffids
219+
220+
additionalProperties: false
221+
222+
examples:
223+
- |
224+
interrupt-controller {
225+
compatible = "arm,gic-v5";
226+
227+
#interrupt-cells = <3>;
228+
interrupt-controller;
229+
230+
#address-cells = <1>;
231+
#size-cells = <1>;
232+
ranges;
233+
234+
interrupts = <1 25 4>;
235+
236+
irs@2f1a0000 {
237+
compatible = "arm,gic-v5-irs";
238+
reg = <0x2f1a0000 0x10000>; // IRS_CONFIG_FRAME
239+
reg-names = "ns-config";
240+
241+
#address-cells = <1>;
242+
#size-cells = <1>;
243+
ranges;
244+
245+
cpus = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>, <&cpu4>, <&cpu5>, <&cpu6>, <&cpu7>;
246+
arm,iaffids = /bits/ 16 <0 1 2 3 4 5 6 7>;
247+
248+
its@2f120000 {
249+
compatible = "arm,gic-v5-its";
250+
reg = <0x2f120000 0x10000>; // ITS_CONFIG_FRAME
251+
reg-names = "ns-config";
252+
253+
#address-cells = <1>;
254+
#size-cells = <1>;
255+
ranges;
256+
257+
msi-controller@2f130000 {
258+
reg = <0x2f130000 0x10000>; // ITS_TRANSLATE_FRAME
259+
reg-names = "ns-translate";
260+
261+
#msi-cells = <1>;
262+
msi-controller;
263+
};
264+
};
265+
};
266+
};
267+
...

0 commit comments

Comments
 (0)