Skip to content

Commit a0b1340

Browse files
mhklinuxliuw
authored andcommitted
Documentation: hyperv: Improve synic and interrupt handling description
Current documentation does not describe how Linux handles the synthetic interrupt controller (synic) that Hyper-V provides to guest VMs, nor how VMBus or timer interrupts are handled. Add text describing the synic and reorganize existing text to make this more clear. Signed-off-by: Michael Kelley <[email protected]> Reviewed-by: Easwar Hariharan <[email protected]> Reviewed-by: Bagas Sanjaya <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]> Message-ID: <[email protected]>
1 parent 4c5a65f commit a0b1340

File tree

2 files changed

+66
-34
lines changed

2 files changed

+66
-34
lines changed

Documentation/virt/hyperv/clocks.rst

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,21 @@ shared page with scale and offset values into user space. User
6262
space code performs the same algorithm of reading the TSC and
6363
applying the scale and offset to get the constant 10 MHz clock.
6464

65-
Linux clockevents are based on Hyper-V synthetic timer 0. While
66-
Hyper-V offers 4 synthetic timers for each CPU, Linux only uses
67-
timer 0. Interrupts from stimer0 are recorded on the "HVS" line in
68-
/proc/interrupts. Clockevents based on the virtualized PIT and
69-
local APIC timer also work, but the Hyper-V synthetic timer is
70-
preferred.
65+
Linux clockevents are based on Hyper-V synthetic timer 0 (stimer0).
66+
While Hyper-V offers 4 synthetic timers for each CPU, Linux only uses
67+
timer 0. In older versions of Hyper-V, an interrupt from stimer0
68+
results in a VMBus control message that is demultiplexed by
69+
vmbus_isr() as described in the Documentation/virt/hyperv/vmbus.rst
70+
documentation. In newer versions of Hyper-V, stimer0 interrupts can
71+
be mapped to an architectural interrupt, which is referred to as
72+
"Direct Mode". Linux prefers to use Direct Mode when available. Since
73+
x86/x64 doesn't support per-CPU interrupts, Direct Mode statically
74+
allocates an x86 interrupt vector (HYPERV_STIMER0_VECTOR) across all CPUs
75+
and explicitly codes it to call the stimer0 interrupt handler. Hence
76+
interrupts from stimer0 are recorded on the "HVS" line in /proc/interrupts
77+
rather than being associated with a Linux IRQ. Clockevents based on the
78+
virtualized PIT and local APIC timer also work, but Hyper-V stimer0
79+
is preferred.
7180

7281
The driver for the Hyper-V synthetic system clock and timers is
7382
drivers/clocksource/hyperv_timer.c.

Documentation/virt/hyperv/vmbus.rst

Lines changed: 51 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,10 @@ resources. For Windows Server 2019 and later, this limit is
102102
approximately 1280 Mbytes. For versions prior to Windows Server
103103
2019, the limit is approximately 384 Mbytes.
104104

105-
VMBus messages
106-
--------------
107-
All VMBus messages have a standard header that includes the message
108-
length, the offset of the message payload, some flags, and a
105+
VMBus channel messages
106+
----------------------
107+
All messages sent in a VMBus channel have a standard header that includes
108+
the message length, the offset of the message payload, some flags, and a
109109
transactionID. The portion of the message after the header is
110110
unique to each VSP/VSC pair.
111111

@@ -137,7 +137,7 @@ control message contains a list of GPAs that describe the data
137137
buffer. For example, the storvsc driver uses this approach to
138138
specify the data buffers to/from which disk I/O is done.
139139

140-
Three functions exist to send VMBus messages:
140+
Three functions exist to send VMBus channel messages:
141141

142142
1. vmbus_sendpacket(): Control-only messages and messages with
143143
embedded data -- no GPAs
@@ -165,6 +165,37 @@ performed in this temporary buffer without the risk of Hyper-V
165165
maliciously modifying the message after it is validated but before
166166
it is used.
167167

168+
Synthetic Interrupt Controller (synic)
169+
--------------------------------------
170+
Hyper-V provides each guest CPU with a synthetic interrupt controller
171+
that is used by VMBus for host-guest communication. While each synic
172+
defines 16 synthetic interrupts (SINT), Linux uses only one of the 16
173+
(VMBUS_MESSAGE_SINT). All interrupts related to communication between
174+
the Hyper-V host and a guest CPU use that SINT.
175+
176+
The SINT is mapped to a single per-CPU architectural interrupt (i.e,
177+
an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because
178+
each CPU in the guest has a synic and may receive VMBus interrupts,
179+
they are best modeled in Linux as per-CPU interrupts. This model works
180+
well on arm64 where a single per-CPU Linux IRQ is allocated for
181+
VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled
182+
"Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86
183+
interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR)
184+
across all CPUs and explicitly coded to call vmbus_isr(). In this case,
185+
there's no Linux IRQ, and the interrupts are visible in aggregate in
186+
/proc/interrupts on the "HYP" line.
187+
188+
The synic provides the means to demultiplex the architectural interrupt into
189+
one or more logical interrupts and route the logical interrupt to the proper
190+
VMBus handler in Linux. This demultiplexing is done by vmbus_isr() and
191+
related functions that access synic data structures.
192+
193+
The synic is not modeled in Linux as an irq chip or irq domain,
194+
and the demultiplexed logical interrupts are not Linux IRQs. As such,
195+
they don't appear in /proc/interrupts or /proc/irq. The CPU
196+
affinity for one of these logical interrupts is controlled via an
197+
entry under /sys/bus/vmbus as described below.
198+
168199
VMBus interrupts
169200
----------------
170201
VMBus provides a mechanism for the guest to interrupt the host when
@@ -176,16 +207,18 @@ unnecessary. If a guest sends an excessive number of unnecessary
176207
interrupts, the host may throttle that guest by suspending its
177208
execution for a few seconds to prevent a denial-of-service attack.
178209

179-
Similarly, the host will interrupt the guest when it sends a new
180-
message on the VMBus control path, or when a VMBus channel "in" ring
181-
buffer transitions from empty to non-empty. Each CPU in the guest
182-
may receive VMBus interrupts, so they are best modeled as per-CPU
183-
interrupts in Linux. This model works well on arm64 where a single
184-
per-CPU IRQ is allocated for VMBus. Since x86/x64 lacks support for
185-
per-CPU IRQs, an x86 interrupt vector is statically allocated (see
186-
HYPERVISOR_CALLBACK_VECTOR) across all CPUs and explicitly coded to
187-
call the VMBus interrupt service routine. These interrupts are
188-
visible in /proc/interrupts on the "HYP" line.
210+
Similarly, the host will interrupt the guest via the synic when
211+
it sends a new message on the VMBus control path, or when a VMBus
212+
channel "in" ring buffer transitions from empty to non-empty due to
213+
the host inserting a new VMBus channel message. The control message stream
214+
and each VMBus channel "in" ring buffer are separate logical interrupts
215+
that are demultiplexed by vmbus_isr(). It demultiplexes by first checking
216+
for channel interrupts by calling vmbus_chan_sched(), which looks at a synic
217+
bitmap to determine which channels have pending interrupts on this CPU.
218+
If multiple channels have pending interrupts for this CPU, they are
219+
processed sequentially. When all channel interrupts have been processed,
220+
vmbus_isr() checks for and processes any messages received on the VMBus
221+
control path.
189222

190223
The guest CPU that a VMBus channel will interrupt is selected by the
191224
guest when the channel is created, and the host is informed of that
@@ -212,26 +245,16 @@ neither "unmanaged" nor "managed" interrupts.
212245
The CPU that a VMBus channel will interrupt can be seen in
213246
/sys/bus/vmbus/devices/<deviceGUID>/ channels/<channelRelID>/cpu.
214247
When running on later versions of Hyper-V, the CPU can be changed
215-
by writing a new value to this sysfs entry. Because the interrupt
216-
assignment is done outside of the normal Linux affinity mechanism,
217-
there are no entries in /proc/irq corresponding to individual
218-
VMBus channel interrupts.
248+
by writing a new value to this sysfs entry. Because VMBus channel
249+
interrupts are not Linux IRQs, there are no entries in /proc/interrupts
250+
or /proc/irq corresponding to individual VMBus channel interrupts.
219251

220252
An online CPU in a Linux guest may not be taken offline if it has
221253
VMBus channel interrupts assigned to it. Any such channel
222254
interrupts must first be manually reassigned to another CPU as
223255
described above. When no channel interrupts are assigned to the
224256
CPU, it can be taken offline.
225257

226-
When a guest CPU receives a VMBus interrupt from the host, the
227-
function vmbus_isr() handles the interrupt. It first checks for
228-
channel interrupts by calling vmbus_chan_sched(), which looks at a
229-
bitmap setup by the host to determine which channels have pending
230-
interrupts on this CPU. If multiple channels have pending
231-
interrupts for this CPU, they are processed sequentially. When all
232-
channel interrupts have been processed, vmbus_isr() checks for and
233-
processes any message received on the VMBus control path.
234-
235258
The VMBus channel interrupt handling code is designed to work
236259
correctly even if an interrupt is received on a CPU other than the
237260
CPU assigned to the channel. Specifically, the code does not use

0 commit comments

Comments
 (0)