1
1
.. SPDX-License-Identifier: GPL-2.0
2
2
3
- VMbus
3
+ VMBus
4
4
=====
5
- VMbus is a software construct provided by Hyper-V to guest VMs. It
5
+ VMBus is a software construct provided by Hyper-V to guest VMs. It
6
6
consists of a control path and common facilities used by synthetic
7
7
devices that Hyper-V presents to guest VMs. The control path is
8
8
used to offer synthetic devices to the guest VM and, in some cases,
@@ -12,9 +12,9 @@ and the synthetic device implementation that is part of Hyper-V, and
12
12
signaling primitives to allow Hyper-V and the guest to interrupt
13
13
each other.
14
14
15
- VMbus is modeled in Linux as a bus, with the expected /sys/bus/vmbus
16
- entry in a running Linux guest. The VMbus driver (drivers/hv/vmbus_drv.c)
17
- establishes the VMbus control path with the Hyper-V host, then
15
+ VMBus is modeled in Linux as a bus, with the expected /sys/bus/vmbus
16
+ entry in a running Linux guest. The VMBus driver (drivers/hv/vmbus_drv.c)
17
+ establishes the VMBus control path with the Hyper-V host, then
18
18
registers itself as a Linux bus driver. It implements the standard
19
19
bus functions for adding and removing devices to/from the bus.
20
20
@@ -49,9 +49,9 @@ synthetic NIC is referred to as "netvsc" and the Linux driver for
49
49
the synthetic SCSI controller is "storvsc". These drivers contain
50
50
functions with names like "storvsc_connect_to_vsp".
51
51
52
- VMbus channels
52
+ VMBus channels
53
53
--------------
54
- An instance of a synthetic device uses VMbus channels to communicate
54
+ An instance of a synthetic device uses VMBus channels to communicate
55
55
between the VSP and the VSC. Channels are bi-directional and used
56
56
for passing messages. Most synthetic devices use a single channel,
57
57
but the synthetic SCSI controller and synthetic NIC may use multiple
@@ -73,7 +73,7 @@ write indices and some control flags, followed by the memory for the
73
73
actual ring. The size of the ring is determined by the VSC in the
74
74
guest and is specific to each synthetic device. The list of GPAs
75
75
making up the ring is communicated to the Hyper-V host over the
76
- VMbus control path as a GPA Descriptor List (GPADL). See function
76
+ VMBus control path as a GPA Descriptor List (GPADL). See function
77
77
vmbus_establish_gpadl().
78
78
79
79
Each ring buffer is mapped into contiguous Linux kernel virtual
@@ -102,10 +102,10 @@ resources. For Windows Server 2019 and later, this limit is
102
102
approximately 1280 Mbytes. For versions prior to Windows Server
103
103
2019, the limit is approximately 384 Mbytes.
104
104
105
- VMbus messages
106
- --------------
107
- All VMbus messages have a standard header that includes the message
108
- length, the offset of the message payload, some flags, and a
105
+ VMBus channel messages
106
+ ----------------------
107
+ All messages sent in a VMBus channel have a standard header that includes
108
+ the message length, the offset of the message payload, some flags, and a
109
109
transactionID. The portion of the message after the header is
110
110
unique to each VSP/VSC pair.
111
111
@@ -137,7 +137,7 @@ control message contains a list of GPAs that describe the data
137
137
buffer. For example, the storvsc driver uses this approach to
138
138
specify the data buffers to/from which disk I/O is done.
139
139
140
- Three functions exist to send VMbus messages:
140
+ Three functions exist to send VMBus channel messages:
141
141
142
142
1. vmbus_sendpacket(): Control-only messages and messages with
143
143
embedded data -- no GPAs
@@ -154,20 +154,51 @@ Historically, Linux guests have trusted Hyper-V to send well-formed
154
154
and valid messages, and Linux drivers for synthetic devices did not
155
155
fully validate messages. With the introduction of processor
156
156
technologies that fully encrypt guest memory and that allow the
157
- guest to not trust the hypervisor (AMD SNP- SEV, Intel TDX), trusting
157
+ guest to not trust the hypervisor (AMD SEV-SNP , Intel TDX), trusting
158
158
the Hyper-V host is no longer a valid assumption. The drivers for
159
- VMbus synthetic devices are being updated to fully validate any
159
+ VMBus synthetic devices are being updated to fully validate any
160
160
values read from memory that is shared with Hyper-V, which includes
161
- messages from VMbus devices. To facilitate such validation,
161
+ messages from VMBus devices. To facilitate such validation,
162
162
messages read by the guest from the "in" ring buffer are copied to a
163
163
temporary buffer that is not shared with Hyper-V. Validation is
164
164
performed in this temporary buffer without the risk of Hyper-V
165
165
maliciously modifying the message after it is validated but before
166
166
it is used.
167
167
168
- VMbus interrupts
168
+ Synthetic Interrupt Controller (synic)
169
+ --------------------------------------
170
+ Hyper-V provides each guest CPU with a synthetic interrupt controller
171
+ that is used by VMBus for host-guest communication. While each synic
172
+ defines 16 synthetic interrupts (SINT), Linux uses only one of the 16
173
+ (VMBUS_MESSAGE_SINT). All interrupts related to communication between
174
+ the Hyper-V host and a guest CPU use that SINT.
175
+
176
+ The SINT is mapped to a single per-CPU architectural interrupt (i.e,
177
+ an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because
178
+ each CPU in the guest has a synic and may receive VMBus interrupts,
179
+ they are best modeled in Linux as per-CPU interrupts. This model works
180
+ well on arm64 where a single per-CPU Linux IRQ is allocated for
181
+ VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled
182
+ "Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86
183
+ interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR)
184
+ across all CPUs and explicitly coded to call vmbus_isr(). In this case,
185
+ there's no Linux IRQ, and the interrupts are visible in aggregate in
186
+ /proc/interrupts on the "HYP" line.
187
+
188
+ The synic provides the means to demultiplex the architectural interrupt into
189
+ one or more logical interrupts and route the logical interrupt to the proper
190
+ VMBus handler in Linux. This demultiplexing is done by vmbus_isr() and
191
+ related functions that access synic data structures.
192
+
193
+ The synic is not modeled in Linux as an irq chip or irq domain,
194
+ and the demultiplexed logical interrupts are not Linux IRQs. As such,
195
+ they don't appear in /proc/interrupts or /proc/irq. The CPU
196
+ affinity for one of these logical interrupts is controlled via an
197
+ entry under /sys/bus/vmbus as described below.
198
+
199
+ VMBus interrupts
169
200
----------------
170
- VMbus provides a mechanism for the guest to interrupt the host when
201
+ VMBus provides a mechanism for the guest to interrupt the host when
171
202
the guest has queued new messages in a ring buffer. The host
172
203
expects that the guest will send an interrupt only when an "out"
173
204
ring buffer transitions from empty to non-empty. If the guest sends
@@ -176,63 +207,55 @@ unnecessary. If a guest sends an excessive number of unnecessary
176
207
interrupts, the host may throttle that guest by suspending its
177
208
execution for a few seconds to prevent a denial-of-service attack.
178
209
179
- Similarly, the host will interrupt the guest when it sends a new
180
- message on the VMbus control path, or when a VMbus channel "in" ring
181
- buffer transitions from empty to non-empty. Each CPU in the guest
182
- may receive VMbus interrupts, so they are best modeled as per-CPU
183
- interrupts in Linux. This model works well on arm64 where a single
184
- per-CPU IRQ is allocated for VMbus. Since x86/x64 lacks support for
185
- per-CPU IRQs, an x86 interrupt vector is statically allocated (see
186
- HYPERVISOR_CALLBACK_VECTOR) across all CPUs and explicitly coded to
187
- call the VMbus interrupt service routine. These interrupts are
188
- visible in /proc/interrupts on the "HYP" line.
189
-
190
- The guest CPU that a VMbus channel will interrupt is selected by the
210
+ Similarly, the host will interrupt the guest via the synic when
211
+ it sends a new message on the VMBus control path, or when a VMBus
212
+ channel "in" ring buffer transitions from empty to non-empty due to
213
+ the host inserting a new VMBus channel message. The control message stream
214
+ and each VMBus channel "in" ring buffer are separate logical interrupts
215
+ that are demultiplexed by vmbus_isr(). It demultiplexes by first checking
216
+ for channel interrupts by calling vmbus_chan_sched(), which looks at a synic
217
+ bitmap to determine which channels have pending interrupts on this CPU.
218
+ If multiple channels have pending interrupts for this CPU, they are
219
+ processed sequentially. When all channel interrupts have been processed,
220
+ vmbus_isr() checks for and processes any messages received on the VMBus
221
+ control path.
222
+
223
+ The guest CPU that a VMBus channel will interrupt is selected by the
191
224
guest when the channel is created, and the host is informed of that
192
- selection. VMbus devices are broadly grouped into two categories:
225
+ selection. VMBus devices are broadly grouped into two categories:
193
226
194
- 1. "Slow" devices that need only one VMbus channel. The devices
227
+ 1. "Slow" devices that need only one VMBus channel. The devices
195
228
(such as keyboard, mouse, heartbeat, and timesync) generate
196
- relatively few interrupts. Their VMbus channels are all
229
+ relatively few interrupts. Their VMBus channels are all
197
230
assigned to interrupt the VMBUS_CONNECT_CPU, which is always
198
231
CPU 0.
199
232
200
- 2. "High speed" devices that may use multiple VMbus channels for
233
+ 2. "High speed" devices that may use multiple VMBus channels for
201
234
higher parallelism and performance. These devices include the
202
- synthetic SCSI controller and synthetic NIC. Their VMbus
235
+ synthetic SCSI controller and synthetic NIC. Their VMBus
203
236
channels interrupts are assigned to CPUs that are spread out
204
237
among the available CPUs in the VM so that interrupts on
205
238
multiple channels can be processed in parallel.
206
239
207
- The assignment of VMbus channel interrupts to CPUs is done in the
240
+ The assignment of VMBus channel interrupts to CPUs is done in the
208
241
function init_vp_index(). This assignment is done outside of the
209
242
normal Linux interrupt affinity mechanism, so the interrupts are
210
243
neither "unmanaged" nor "managed" interrupts.
211
244
212
- The CPU that a VMbus channel will interrupt can be seen in
245
+ The CPU that a VMBus channel will interrupt can be seen in
213
246
/sys/bus/vmbus/devices/<deviceGUID>/ channels/<channelRelID>/cpu.
214
247
When running on later versions of Hyper-V, the CPU can be changed
215
- by writing a new value to this sysfs entry. Because the interrupt
216
- assignment is done outside of the normal Linux affinity mechanism,
217
- there are no entries in /proc/irq corresponding to individual
218
- VMbus channel interrupts.
248
+ by writing a new value to this sysfs entry. Because VMBus channel
249
+ interrupts are not Linux IRQs, there are no entries in /proc/interrupts
250
+ or /proc/irq corresponding to individual VMBus channel interrupts.
219
251
220
252
An online CPU in a Linux guest may not be taken offline if it has
221
- VMbus channel interrupts assigned to it. Any such channel
253
+ VMBus channel interrupts assigned to it. Any such channel
222
254
interrupts must first be manually reassigned to another CPU as
223
255
described above. When no channel interrupts are assigned to the
224
256
CPU, it can be taken offline.
225
257
226
- When a guest CPU receives a VMbus interrupt from the host, the
227
- function vmbus_isr() handles the interrupt. It first checks for
228
- channel interrupts by calling vmbus_chan_sched(), which looks at a
229
- bitmap setup by the host to determine which channels have pending
230
- interrupts on this CPU. If multiple channels have pending
231
- interrupts for this CPU, they are processed sequentially. When all
232
- channel interrupts have been processed, vmbus_isr() checks for and
233
- processes any message received on the VMbus control path.
234
-
235
- The VMbus channel interrupt handling code is designed to work
258
+ The VMBus channel interrupt handling code is designed to work
236
259
correctly even if an interrupt is received on a CPU other than the
237
260
CPU assigned to the channel. Specifically, the code does not use
238
261
CPU-based exclusion for correctness. In normal operation, Hyper-V
@@ -242,23 +265,23 @@ when Hyper-V will make the transition. The code must work correctly
242
265
even if there is a time lag before Hyper-V starts interrupting the
243
266
new CPU. See comments in target_cpu_store().
244
267
245
- VMbus device creation/deletion
268
+ VMBus device creation/deletion
246
269
------------------------------
247
270
Hyper-V and the Linux guest have a separate message-passing path
248
271
that is used for synthetic device creation and deletion. This
249
- path does not use a VMbus channel. See vmbus_post_msg() and
272
+ path does not use a VMBus channel. See vmbus_post_msg() and
250
273
vmbus_on_msg_dpc().
251
274
252
275
The first step is for the guest to connect to the generic
253
- Hyper-V VMbus mechanism. As part of establishing this connection,
254
- the guest and Hyper-V agree on a VMbus protocol version they will
276
+ Hyper-V VMBus mechanism. As part of establishing this connection,
277
+ the guest and Hyper-V agree on a VMBus protocol version they will
255
278
use. This negotiation allows newer Linux kernels to run on older
256
279
Hyper-V versions, and vice versa.
257
280
258
281
The guest then tells Hyper-V to "send offers". Hyper-V sends an
259
282
offer message to the guest for each synthetic device that the VM
260
- is configured to have. Each VMbus device type has a fixed GUID
261
- known as the "class ID", and each VMbus device instance is also
283
+ is configured to have. Each VMBus device type has a fixed GUID
284
+ known as the "class ID", and each VMBus device instance is also
262
285
identified by a GUID. The offer message from Hyper-V contains
263
286
both GUIDs to uniquely (within the VM) identify the device.
264
287
There is one offer message for each device instance, so a VM with
@@ -275,7 +298,7 @@ type based on the class ID, and invokes the correct driver to set up
275
298
the device. Driver/device matching is performed using the standard
276
299
Linux mechanism.
277
300
278
- The device driver probe function opens the primary VMbus channel to
301
+ The device driver probe function opens the primary VMBus channel to
279
302
the corresponding VSP. It allocates guest memory for the channel
280
303
ring buffers and shares the ring buffer with the Hyper-V host by
281
304
giving the host a list of GPAs for the ring buffer memory. See
@@ -285,7 +308,7 @@ Once the ring buffer is set up, the device driver and VSP exchange
285
308
setup messages via the primary channel. These messages may include
286
309
negotiating the device protocol version to be used between the Linux
287
310
VSC and the VSP on the Hyper-V host. The setup messages may also
288
- include creating additional VMbus channels, which are somewhat
311
+ include creating additional VMBus channels, which are somewhat
289
312
mis-named as "sub-channels" since they are functionally
290
313
equivalent to the primary channel once they are created.
291
314
0 commit comments