Skip to content

Commit 6226e74

Browse files
committed
Merge tag 'hyperv-fixes-signed-20240616' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V fixes from Wei Liu: - Some cosmetic changes for hv.c and balloon.c (Aditya Nagesh) - Two documentation updates (Michael Kelley) - Suppress the invalid warning for packed member alignment (Saurabh Sengar) - Two hv_balloon fixes (Michael Kelley) * tag 'hyperv-fixes-signed-20240616' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: Cosmetic changes for hv.c and balloon.c Documentation: hyperv: Improve synic and interrupt handling description Documentation: hyperv: Update spelling and fix typo tools: hv: suppress the invalid warning for packed member alignment hv_balloon: Enable hot-add for memblock sizes > 128 MiB hv_balloon: Use kernel macros to simplify open coded sequences
2 parents 6ba59ff + 831bcbc commit 6226e74

File tree

6 files changed

+208
-206
lines changed

6 files changed

+208
-206
lines changed

Documentation/virt/hyperv/clocks.rst

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,21 @@ shared page with scale and offset values into user space. User
6262
space code performs the same algorithm of reading the TSC and
6363
applying the scale and offset to get the constant 10 MHz clock.
6464

65-
Linux clockevents are based on Hyper-V synthetic timer 0. While
66-
Hyper-V offers 4 synthetic timers for each CPU, Linux only uses
67-
timer 0. Interrupts from stimer0 are recorded on the "HVS" line in
68-
/proc/interrupts. Clockevents based on the virtualized PIT and
69-
local APIC timer also work, but the Hyper-V synthetic timer is
70-
preferred.
65+
Linux clockevents are based on Hyper-V synthetic timer 0 (stimer0).
66+
While Hyper-V offers 4 synthetic timers for each CPU, Linux only uses
67+
timer 0. In older versions of Hyper-V, an interrupt from stimer0
68+
results in a VMBus control message that is demultiplexed by
69+
vmbus_isr() as described in the Documentation/virt/hyperv/vmbus.rst
70+
documentation. In newer versions of Hyper-V, stimer0 interrupts can
71+
be mapped to an architectural interrupt, which is referred to as
72+
"Direct Mode". Linux prefers to use Direct Mode when available. Since
73+
x86/x64 doesn't support per-CPU interrupts, Direct Mode statically
74+
allocates an x86 interrupt vector (HYPERV_STIMER0_VECTOR) across all CPUs
75+
and explicitly codes it to call the stimer0 interrupt handler. Hence
76+
interrupts from stimer0 are recorded on the "HVS" line in /proc/interrupts
77+
rather than being associated with a Linux IRQ. Clockevents based on the
78+
virtualized PIT and local APIC timer also work, but Hyper-V stimer0
79+
is preferred.
7180

7281
The driver for the Hyper-V synthetic system clock and timers is
7382
drivers/clocksource/hyperv_timer.c.

Documentation/virt/hyperv/overview.rst

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Linux guests communicate with Hyper-V in four different ways:
4040
arm64, these synthetic registers must be accessed using explicit
4141
hypercalls.
4242

43-
* VMbus: VMbus is a higher-level software construct that is built on
43+
* VMBus: VMBus is a higher-level software construct that is built on
4444
the other 3 mechanisms. It is a message passing interface between
4545
the Hyper-V host and the Linux guest. It uses memory that is shared
4646
between Hyper-V and the guest, along with various signaling
@@ -54,8 +54,8 @@ x86/x64 architecture only.
5454

5555
.. _Hyper-V Top Level Functional Spec (TLFS): https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
5656

57-
VMbus is not documented. This documentation provides a high-level
58-
overview of VMbus and how it works, but the details can be discerned
57+
VMBus is not documented. This documentation provides a high-level
58+
overview of VMBus and how it works, but the details can be discerned
5959
only from the code.
6060

6161
Sharing Memory
@@ -74,7 +74,7 @@ follows:
7474
physical address space. How Hyper-V is told about the GPA or list
7575
of GPAs varies. In some cases, a single GPA is written to a
7676
synthetic register. In other cases, a GPA or list of GPAs is sent
77-
in a VMbus message.
77+
in a VMBus message.
7878

7979
* Hyper-V translates the GPAs into "real" physical memory addresses,
8080
and creates a virtual mapping that it can use to access the memory.
@@ -133,9 +133,9 @@ only the CPUs actually present in the VM, so Linux does not report
133133
any hot-add CPUs.
134134

135135
A Linux guest CPU may be taken offline using the normal Linux
136-
mechanisms, provided no VMbus channel interrupts are assigned to
137-
the CPU. See the section on VMbus Interrupts for more details
138-
on how VMbus channel interrupts can be re-assigned to permit
136+
mechanisms, provided no VMBus channel interrupts are assigned to
137+
the CPU. See the section on VMBus Interrupts for more details
138+
on how VMBus channel interrupts can be re-assigned to permit
139139
taking a CPU offline.
140140

141141
32-bit and 64-bit
@@ -169,14 +169,14 @@ and functionality. Hyper-V indicates feature/function availability
169169
via flags in synthetic MSRs that Hyper-V provides to the guest,
170170
and the guest code tests these flags.
171171

172-
VMbus has its own protocol version that is negotiated during the
173-
initial VMbus connection from the guest to Hyper-V. This version
172+
VMBus has its own protocol version that is negotiated during the
173+
initial VMBus connection from the guest to Hyper-V. This version
174174
number is also output to dmesg during boot. This version number
175175
is checked in a few places in the code to determine if specific
176176
functionality is present.
177177

178-
Furthermore, each synthetic device on VMbus also has a protocol
179-
version that is separate from the VMbus protocol version. Device
178+
Furthermore, each synthetic device on VMBus also has a protocol
179+
version that is separate from the VMBus protocol version. Device
180180
drivers for these synthetic devices typically negotiate the device
181181
protocol version, and may test that protocol version to determine
182182
if specific device functionality is present.

Documentation/virt/hyperv/vmbus.rst

Lines changed: 83 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
.. SPDX-License-Identifier: GPL-2.0
22
3-
VMbus
3+
VMBus
44
=====
5-
VMbus is a software construct provided by Hyper-V to guest VMs. It
5+
VMBus is a software construct provided by Hyper-V to guest VMs. It
66
consists of a control path and common facilities used by synthetic
77
devices that Hyper-V presents to guest VMs. The control path is
88
used to offer synthetic devices to the guest VM and, in some cases,
@@ -12,9 +12,9 @@ and the synthetic device implementation that is part of Hyper-V, and
1212
signaling primitives to allow Hyper-V and the guest to interrupt
1313
each other.
1414

15-
VMbus is modeled in Linux as a bus, with the expected /sys/bus/vmbus
16-
entry in a running Linux guest. The VMbus driver (drivers/hv/vmbus_drv.c)
17-
establishes the VMbus control path with the Hyper-V host, then
15+
VMBus is modeled in Linux as a bus, with the expected /sys/bus/vmbus
16+
entry in a running Linux guest. The VMBus driver (drivers/hv/vmbus_drv.c)
17+
establishes the VMBus control path with the Hyper-V host, then
1818
registers itself as a Linux bus driver. It implements the standard
1919
bus functions for adding and removing devices to/from the bus.
2020

@@ -49,9 +49,9 @@ synthetic NIC is referred to as "netvsc" and the Linux driver for
4949
the synthetic SCSI controller is "storvsc". These drivers contain
5050
functions with names like "storvsc_connect_to_vsp".
5151

52-
VMbus channels
52+
VMBus channels
5353
--------------
54-
An instance of a synthetic device uses VMbus channels to communicate
54+
An instance of a synthetic device uses VMBus channels to communicate
5555
between the VSP and the VSC. Channels are bi-directional and used
5656
for passing messages. Most synthetic devices use a single channel,
5757
but the synthetic SCSI controller and synthetic NIC may use multiple
@@ -73,7 +73,7 @@ write indices and some control flags, followed by the memory for the
7373
actual ring. The size of the ring is determined by the VSC in the
7474
guest and is specific to each synthetic device. The list of GPAs
7575
making up the ring is communicated to the Hyper-V host over the
76-
VMbus control path as a GPA Descriptor List (GPADL). See function
76+
VMBus control path as a GPA Descriptor List (GPADL). See function
7777
vmbus_establish_gpadl().
7878

7979
Each ring buffer is mapped into contiguous Linux kernel virtual
@@ -102,10 +102,10 @@ resources. For Windows Server 2019 and later, this limit is
102102
approximately 1280 Mbytes. For versions prior to Windows Server
103103
2019, the limit is approximately 384 Mbytes.
104104

105-
VMbus messages
106-
--------------
107-
All VMbus messages have a standard header that includes the message
108-
length, the offset of the message payload, some flags, and a
105+
VMBus channel messages
106+
----------------------
107+
All messages sent in a VMBus channel have a standard header that includes
108+
the message length, the offset of the message payload, some flags, and a
109109
transactionID. The portion of the message after the header is
110110
unique to each VSP/VSC pair.
111111

@@ -137,7 +137,7 @@ control message contains a list of GPAs that describe the data
137137
buffer. For example, the storvsc driver uses this approach to
138138
specify the data buffers to/from which disk I/O is done.
139139

140-
Three functions exist to send VMbus messages:
140+
Three functions exist to send VMBus channel messages:
141141

142142
1. vmbus_sendpacket(): Control-only messages and messages with
143143
embedded data -- no GPAs
@@ -154,20 +154,51 @@ Historically, Linux guests have trusted Hyper-V to send well-formed
154154
and valid messages, and Linux drivers for synthetic devices did not
155155
fully validate messages. With the introduction of processor
156156
technologies that fully encrypt guest memory and that allow the
157-
guest to not trust the hypervisor (AMD SNP-SEV, Intel TDX), trusting
157+
guest to not trust the hypervisor (AMD SEV-SNP, Intel TDX), trusting
158158
the Hyper-V host is no longer a valid assumption. The drivers for
159-
VMbus synthetic devices are being updated to fully validate any
159+
VMBus synthetic devices are being updated to fully validate any
160160
values read from memory that is shared with Hyper-V, which includes
161-
messages from VMbus devices. To facilitate such validation,
161+
messages from VMBus devices. To facilitate such validation,
162162
messages read by the guest from the "in" ring buffer are copied to a
163163
temporary buffer that is not shared with Hyper-V. Validation is
164164
performed in this temporary buffer without the risk of Hyper-V
165165
maliciously modifying the message after it is validated but before
166166
it is used.
167167

168-
VMbus interrupts
168+
Synthetic Interrupt Controller (synic)
169+
--------------------------------------
170+
Hyper-V provides each guest CPU with a synthetic interrupt controller
171+
that is used by VMBus for host-guest communication. While each synic
172+
defines 16 synthetic interrupts (SINT), Linux uses only one of the 16
173+
(VMBUS_MESSAGE_SINT). All interrupts related to communication between
174+
the Hyper-V host and a guest CPU use that SINT.
175+
176+
The SINT is mapped to a single per-CPU architectural interrupt (i.e,
177+
an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because
178+
each CPU in the guest has a synic and may receive VMBus interrupts,
179+
they are best modeled in Linux as per-CPU interrupts. This model works
180+
well on arm64 where a single per-CPU Linux IRQ is allocated for
181+
VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled
182+
"Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86
183+
interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR)
184+
across all CPUs and explicitly coded to call vmbus_isr(). In this case,
185+
there's no Linux IRQ, and the interrupts are visible in aggregate in
186+
/proc/interrupts on the "HYP" line.
187+
188+
The synic provides the means to demultiplex the architectural interrupt into
189+
one or more logical interrupts and route the logical interrupt to the proper
190+
VMBus handler in Linux. This demultiplexing is done by vmbus_isr() and
191+
related functions that access synic data structures.
192+
193+
The synic is not modeled in Linux as an irq chip or irq domain,
194+
and the demultiplexed logical interrupts are not Linux IRQs. As such,
195+
they don't appear in /proc/interrupts or /proc/irq. The CPU
196+
affinity for one of these logical interrupts is controlled via an
197+
entry under /sys/bus/vmbus as described below.
198+
199+
VMBus interrupts
169200
----------------
170-
VMbus provides a mechanism for the guest to interrupt the host when
201+
VMBus provides a mechanism for the guest to interrupt the host when
171202
the guest has queued new messages in a ring buffer. The host
172203
expects that the guest will send an interrupt only when an "out"
173204
ring buffer transitions from empty to non-empty. If the guest sends
@@ -176,63 +207,55 @@ unnecessary. If a guest sends an excessive number of unnecessary
176207
interrupts, the host may throttle that guest by suspending its
177208
execution for a few seconds to prevent a denial-of-service attack.
178209

179-
Similarly, the host will interrupt the guest when it sends a new
180-
message on the VMbus control path, or when a VMbus channel "in" ring
181-
buffer transitions from empty to non-empty. Each CPU in the guest
182-
may receive VMbus interrupts, so they are best modeled as per-CPU
183-
interrupts in Linux. This model works well on arm64 where a single
184-
per-CPU IRQ is allocated for VMbus. Since x86/x64 lacks support for
185-
per-CPU IRQs, an x86 interrupt vector is statically allocated (see
186-
HYPERVISOR_CALLBACK_VECTOR) across all CPUs and explicitly coded to
187-
call the VMbus interrupt service routine. These interrupts are
188-
visible in /proc/interrupts on the "HYP" line.
189-
190-
The guest CPU that a VMbus channel will interrupt is selected by the
210+
Similarly, the host will interrupt the guest via the synic when
211+
it sends a new message on the VMBus control path, or when a VMBus
212+
channel "in" ring buffer transitions from empty to non-empty due to
213+
the host inserting a new VMBus channel message. The control message stream
214+
and each VMBus channel "in" ring buffer are separate logical interrupts
215+
that are demultiplexed by vmbus_isr(). It demultiplexes by first checking
216+
for channel interrupts by calling vmbus_chan_sched(), which looks at a synic
217+
bitmap to determine which channels have pending interrupts on this CPU.
218+
If multiple channels have pending interrupts for this CPU, they are
219+
processed sequentially. When all channel interrupts have been processed,
220+
vmbus_isr() checks for and processes any messages received on the VMBus
221+
control path.
222+
223+
The guest CPU that a VMBus channel will interrupt is selected by the
191224
guest when the channel is created, and the host is informed of that
192-
selection. VMbus devices are broadly grouped into two categories:
225+
selection. VMBus devices are broadly grouped into two categories:
193226

194-
1. "Slow" devices that need only one VMbus channel. The devices
227+
1. "Slow" devices that need only one VMBus channel. The devices
195228
(such as keyboard, mouse, heartbeat, and timesync) generate
196-
relatively few interrupts. Their VMbus channels are all
229+
relatively few interrupts. Their VMBus channels are all
197230
assigned to interrupt the VMBUS_CONNECT_CPU, which is always
198231
CPU 0.
199232

200-
2. "High speed" devices that may use multiple VMbus channels for
233+
2. "High speed" devices that may use multiple VMBus channels for
201234
higher parallelism and performance. These devices include the
202-
synthetic SCSI controller and synthetic NIC. Their VMbus
235+
synthetic SCSI controller and synthetic NIC. Their VMBus
203236
channels interrupts are assigned to CPUs that are spread out
204237
among the available CPUs in the VM so that interrupts on
205238
multiple channels can be processed in parallel.
206239

207-
The assignment of VMbus channel interrupts to CPUs is done in the
240+
The assignment of VMBus channel interrupts to CPUs is done in the
208241
function init_vp_index(). This assignment is done outside of the
209242
normal Linux interrupt affinity mechanism, so the interrupts are
210243
neither "unmanaged" nor "managed" interrupts.
211244

212-
The CPU that a VMbus channel will interrupt can be seen in
245+
The CPU that a VMBus channel will interrupt can be seen in
213246
/sys/bus/vmbus/devices/<deviceGUID>/ channels/<channelRelID>/cpu.
214247
When running on later versions of Hyper-V, the CPU can be changed
215-
by writing a new value to this sysfs entry. Because the interrupt
216-
assignment is done outside of the normal Linux affinity mechanism,
217-
there are no entries in /proc/irq corresponding to individual
218-
VMbus channel interrupts.
248+
by writing a new value to this sysfs entry. Because VMBus channel
249+
interrupts are not Linux IRQs, there are no entries in /proc/interrupts
250+
or /proc/irq corresponding to individual VMBus channel interrupts.
219251

220252
An online CPU in a Linux guest may not be taken offline if it has
221-
VMbus channel interrupts assigned to it. Any such channel
253+
VMBus channel interrupts assigned to it. Any such channel
222254
interrupts must first be manually reassigned to another CPU as
223255
described above. When no channel interrupts are assigned to the
224256
CPU, it can be taken offline.
225257

226-
When a guest CPU receives a VMbus interrupt from the host, the
227-
function vmbus_isr() handles the interrupt. It first checks for
228-
channel interrupts by calling vmbus_chan_sched(), which looks at a
229-
bitmap setup by the host to determine which channels have pending
230-
interrupts on this CPU. If multiple channels have pending
231-
interrupts for this CPU, they are processed sequentially. When all
232-
channel interrupts have been processed, vmbus_isr() checks for and
233-
processes any message received on the VMbus control path.
234-
235-
The VMbus channel interrupt handling code is designed to work
258+
The VMBus channel interrupt handling code is designed to work
236259
correctly even if an interrupt is received on a CPU other than the
237260
CPU assigned to the channel. Specifically, the code does not use
238261
CPU-based exclusion for correctness. In normal operation, Hyper-V
@@ -242,23 +265,23 @@ when Hyper-V will make the transition. The code must work correctly
242265
even if there is a time lag before Hyper-V starts interrupting the
243266
new CPU. See comments in target_cpu_store().
244267

245-
VMbus device creation/deletion
268+
VMBus device creation/deletion
246269
------------------------------
247270
Hyper-V and the Linux guest have a separate message-passing path
248271
that is used for synthetic device creation and deletion. This
249-
path does not use a VMbus channel. See vmbus_post_msg() and
272+
path does not use a VMBus channel. See vmbus_post_msg() and
250273
vmbus_on_msg_dpc().
251274

252275
The first step is for the guest to connect to the generic
253-
Hyper-V VMbus mechanism. As part of establishing this connection,
254-
the guest and Hyper-V agree on a VMbus protocol version they will
276+
Hyper-V VMBus mechanism. As part of establishing this connection,
277+
the guest and Hyper-V agree on a VMBus protocol version they will
255278
use. This negotiation allows newer Linux kernels to run on older
256279
Hyper-V versions, and vice versa.
257280

258281
The guest then tells Hyper-V to "send offers". Hyper-V sends an
259282
offer message to the guest for each synthetic device that the VM
260-
is configured to have. Each VMbus device type has a fixed GUID
261-
known as the "class ID", and each VMbus device instance is also
283+
is configured to have. Each VMBus device type has a fixed GUID
284+
known as the "class ID", and each VMBus device instance is also
262285
identified by a GUID. The offer message from Hyper-V contains
263286
both GUIDs to uniquely (within the VM) identify the device.
264287
There is one offer message for each device instance, so a VM with
@@ -275,7 +298,7 @@ type based on the class ID, and invokes the correct driver to set up
275298
the device. Driver/device matching is performed using the standard
276299
Linux mechanism.
277300

278-
The device driver probe function opens the primary VMbus channel to
301+
The device driver probe function opens the primary VMBus channel to
279302
the corresponding VSP. It allocates guest memory for the channel
280303
ring buffers and shares the ring buffer with the Hyper-V host by
281304
giving the host a list of GPAs for the ring buffer memory. See
@@ -285,7 +308,7 @@ Once the ring buffer is set up, the device driver and VSP exchange
285308
setup messages via the primary channel. These messages may include
286309
negotiating the device protocol version to be used between the Linux
287310
VSC and the VSP on the Hyper-V host. The setup messages may also
288-
include creating additional VMbus channels, which are somewhat
311+
include creating additional VMBus channels, which are somewhat
289312
mis-named as "sub-channels" since they are functionally
290313
equivalent to the primary channel once they are created.
291314

0 commit comments

Comments
 (0)