Skip to content

Commit 3ecf671

Browse files
ashok-rajsuryasaimadhu
authored andcommitted
x86/microcode: Document the whole late loading problem
Commit d23d33e ("x86/microcode: Taint and warn on late loading") started tainting the kernel after microcode late loading. There is some history behind why x86 microcode started doing the late loading stop_machine() rendezvous. Document the whole situation. No functional changes. [ bp: Fix typos, heavily massage. ] Signed-off-by: Ashok Raj <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent 568035b commit 3ecf671

File tree

2 files changed

+113
-9
lines changed

2 files changed

+113
-9
lines changed

Documentation/admin-guide/tainted-kernels.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,12 @@ More detailed explanation for tainting
134134
scsi/snic on something else than x86_64, scsi/ips on non
135135
x86/x86_64/itanium, have broken firmware settings for the
136136
irqchip/irq-gic on arm64 ...).
137+
- x86/x86_64: Microcode late loading is dangerous and will result in
138+
tainting the kernel. It requires that all CPUs rendezvous to make sure
139+
the update happens when the system is as quiescent as possible. However,
140+
a higher priority MCE/SMI/NMI can move control flow away from that
141+
rendezvous and interrupt the update, which can be detrimental to the
142+
machine.
137143

138144
3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all
139145
modules were unloaded normally.

Documentation/x86/microcode.rst

Lines changed: 107 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ The Linux Microcode Loader
66

77
:Authors: - Fenghua Yu <[email protected]>
88
- Borislav Petkov <[email protected]>
9+
- Ashok Raj <[email protected]>
910

1011
The kernel has a x86 microcode loading facility which is supposed to
1112
provide microcode loading methods in the OS. Potential use cases are
@@ -92,15 +93,8 @@ vendor's site.
9293
Late loading
9394
============
9495

95-
There are two legacy user space interfaces to load microcode, either through
96-
/dev/cpu/microcode or through /sys/devices/system/cpu/microcode/reload file
97-
in sysfs.
98-
99-
The /dev/cpu/microcode method is deprecated because it needs a special
100-
userspace tool for that.
101-
102-
The easier method is simply installing the microcode packages your distro
103-
supplies and running::
96+
You simply install the microcode packages your distro supplies and
97+
run::
10498

10599
# echo 1 > /sys/devices/system/cpu/microcode/reload
106100

@@ -110,6 +104,110 @@ The loading mechanism looks for microcode blobs in
110104
/lib/firmware/{intel-ucode,amd-ucode}. The default distro installation
111105
packages already put them there.
112106

107+
Since kernel 5.19, late loading is not enabled by default.
108+
109+
The /dev/cpu/microcode method has been removed in 5.19.
110+
111+
Why is late loading dangerous?
112+
==============================
113+
114+
Synchronizing all CPUs
115+
----------------------
116+
117+
The microcode engine which receives the microcode update is shared
118+
between the two logical threads in a SMT system. Therefore, when
119+
the update is executed on one SMT thread of the core, the sibling
120+
"automatically" gets the update.
121+
122+
Since the microcode can "simulate" MSRs too, while the microcode update
123+
is in progress, those simulated MSRs transiently cease to exist. This
124+
can result in unpredictable results if the SMT sibling thread happens to
125+
be in the middle of an access to such an MSR. The usual observation is
126+
that such MSR accesses cause #GPs to be raised to signal that former are
127+
not present.
128+
129+
The disappearing MSRs are just one common issue which is being observed.
130+
Any other instruction that's being patched and gets concurrently
131+
executed by the other SMT sibling, can also result in similar,
132+
unpredictable behavior.
133+
134+
To eliminate this case, a stop_machine()-based CPU synchronization was
135+
introduced as a way to guarantee that all logical CPUs will not execute
136+
any code but just wait in a spin loop, polling an atomic variable.
137+
138+
While this took care of device or external interrupts, IPIs including
139+
LVT ones, such as CMCI etc, it cannot address other special interrupts
140+
that can't be shut off. Those are Machine Check (#MC), System Management
141+
(#SMI) and Non-Maskable interrupts (#NMI).
142+
143+
Machine Checks
144+
--------------
145+
146+
Machine Checks (#MC) are non-maskable. There are two kinds of MCEs.
147+
Fatal un-recoverable MCEs and recoverable MCEs. While un-recoverable
148+
errors are fatal, recoverable errors can also happen in kernel context
149+
are also treated as fatal by the kernel.
150+
151+
On certain Intel machines, MCEs are also broadcast to all threads in a
152+
system. If one thread is in the middle of executing WRMSR, a MCE will be
153+
taken at the end of the flow. Either way, they will wait for the thread
154+
performing the wrmsr(0x79) to rendezvous in the MCE handler and shutdown
155+
eventually if any of the threads in the system fail to check in to the
156+
MCE rendezvous.
157+
158+
To be paranoid and get predictable behavior, the OS can choose to set
159+
MCG_STATUS.MCIP. Since MCEs can be at most one in a system, if an
160+
MCE was signaled, the above condition will promote to a system reset
161+
automatically. OS can turn off MCIP at the end of the update for that
162+
core.
163+
164+
System Management Interrupt
165+
---------------------------
166+
167+
SMIs are also broadcast to all CPUs in the platform. Microcode update
168+
requests exclusive access to the core before writing to MSR 0x79. So if
169+
it does happen such that, one thread is in WRMSR flow, and the 2nd got
170+
an SMI, that thread will be stopped in the first instruction in the SMI
171+
handler.
172+
173+
Since the secondary thread is stopped in the first instruction in SMI,
174+
there is very little chance that it would be in the middle of executing
175+
an instruction being patched. Plus OS has no way to stop SMIs from
176+
happening.
177+
178+
Non-Maskable Interrupts
179+
-----------------------
180+
181+
When thread0 of a core is doing the microcode update, if thread1 is
182+
pulled into NMI, that can cause unpredictable behavior due to the
183+
reasons above.
184+
185+
OS can choose a variety of methods to avoid running into this situation.
186+
187+
188+
Is the microcode suitable for late loading?
189+
-------------------------------------------
190+
191+
Late loading is done when the system is fully operational and running
192+
real workloads. Late loading behavior depends on what the base patch on
193+
the CPU is before upgrading to the new patch.
194+
195+
This is true for Intel CPUs.
196+
197+
Consider, for example, a CPU has patch level 1 and the update is to
198+
patch level 3.
199+
200+
Between patch1 and patch3, patch2 might have deprecated a software-visible
201+
feature.
202+
203+
This is unacceptable if software is even potentially using that feature.
204+
For instance, say MSR_X is no longer available after an update,
205+
accessing that MSR will cause a #GP fault.
206+
207+
Basically there is no way to declare a new microcode update suitable
208+
for late-loading. This is another one of the problems that caused late
209+
loading to be not enabled by default.
210+
113211
Builtin microcode
114212
=================
115213

0 commit comments

Comments
 (0)