[docs,qemu] Add introductory QEMU documentation

AlexJones0 · AlexJones0 · commit 8e539921b688 · 2025-10-29T14:25:04.000Z
Add some introductory documentation to explain QEMU, it's integration
into OpenTitan itself, and caveats that users should be aware of.

The focus of these docs are primarily on introducing the basic concepts
of QEMU, explaining the integration with OpenTitan, and noting any
caveats that are useful to be aware of when testing or developing
software stacks. Where possible, detailed explanations of QEMU's
internals or explanations of how OpenTitan emulation works is delegated
to the existing QEMU / QEMU fork documentation, to avoid replication and
docs getting out of date.

The one exception is the list of unemulated features - a short
non-comprehensive list is reproduced as it is probably useful that
anyone who wants to use QEMU is aware of some of the major caveats, even
if the list is not complete / comprehensive.

Signed-off-by: Alex Jones &lt;alex.jones@lowrisc.org&gt;
diff --git a/third_party/qemu/README.md b/third_party/qemu/README.md
@@ -2,8 +2,103 @@
 
 For instructions on setting up QEMU for local development and troubleshooting steps, see the [setup guide](./setup.md).
 
+## Introduction
+
+[QEMU](https://www.qemu.org/) is an open-source project which primarily aims to support virtualization, user-mode emulation and full-system (machine) emulation flows.
+Its primary goal is to provide fast functional emulation in a generic manner that supports many host devices, at the cost of losing the accuracy offered by simulation.
+OpenTitan is an example of full-system emulation, where we emulate an OpenTitan RISC-V 32-bit machine to support running OpenTitan programs on any supported host, such as an x86-64 PC.
+In this context, a QEMU OpenTitan "machine" is a system consisting of an Ibex core, optionally including other cores and attached peripherals (equivalent to a "top").
+
+The predominant advantage of QEMU is its speed: internally, QEMU uses a dynamic translation backend called [TCG](https://www.qemu.org/docs/master/devel/tcg.html) (_Tiny Code Generator_) which translates blocks of guest code (RISC-V for the Ibex core used in OpenTitan) into equivalent optimized host code (e.g. x86-64) for execution.
+By using QEMU, tests and custom software stacks for OpenTitan can be executed and tested by integrators, all running on a single host device without requiring any specialized hardware.
+This can often allow parallel execution at faster speeds than hardware simulation, at the cost of completely accurate behavior.
+
+OpenTitan's emulated machines are currently developed on a [fork](https://github.com/lowRISC/qemu) of upstream QEMU to better facilitate the implementation of non-standard RISC-V extensions implemented in the Ibex CPU, and other extensions required to support OpenTitan functionality.
+There are currently two main emulation targets: Earlgrey 1.0.0, and Darjeeling (targeting the moving `master` branch).
+You can refer to the QEMU repository's documentation for more information on the development and usage of the QEMU OpenTitan machines - this page instead focuses on providing details for testing software with QEMU.
+
+## Scope
+
+Currently only the Earlgrey machine (`ot-earlgrey`) has any support within OpenTitanLib for test integration, primarily on the `earlgrey_1.0.0` branch.
+It is important to note that emulation is **ongoing** and **incomplete** - there are many devices and features which are not yet implemented or are infeasible to emulate within QEMU.
+Before using QEMU, it is strongly advised to read relevant QEMU [Earlgrey](https://github.com/lowRISC/qemu/blob/ot-9.2.0/docs/opentitan/earlgrey.md) and [Darjeeling](https://github.com/lowRISC/qemu/blob/ot-9.2.0/docs/opentitan/darjeeling.md) documentation pages to better understand whether the devices and features that you need are supported.
+Note that these also may not be up-to-date compared to the implementation, and as such the source files should be treated as the root source of truth.
+Furthermore, QEMU releases are pinned in Bazel, meaning that the supported features may differ compared to when this release was taken (see the [setup guide](./setup.md)).
+
+As of the time of writing, there are several unimplemented devices and many devices with partial emulation / missing features.
+The most important such examples to be aware of are:
+* The Power manager slow FSM is bypassed, with no support for low pow power entry / deep sleeps.
+* A pinmux block is implemented, but all connections currently go via a CharDev and do not interact with pinmux.
+* Non-Maskable Interrupts (NMIs) are not implemented.
+* Many other unsupported features, see the QEMU documentation.
+
+Furthermore, note that synchronous (e.g. I2C) or timing-based (UART) communication peripherals are often limited in supported due to the inherent inaccuracy of emulation with QEMU.
+As such, there may be limitations on what you can do using these peripherals.
+
+## Considerations & Constraints
+
+### Timing and Cycle Accuracy
+
+QEMU is _not_ a cycle-accurate emulator, as its main goal is emulation speed:
+* The instruction count reported in `mcycle` will not accurately reflect the number of instructions that have been executed.
+* When using a hardware timer (e.g. the `aon_timer` or `rv_timer`), the timer will expire after _approximately_ the correct amount of time, but will not be exactly accurate.
+
+QEMU has mechanisms for pacing the virtual CPU to match hardware timers, but in general it should be expected that both hardware timers and instruction counts are not correct in QEMU.
+This can be a particular challenge for security countermeasures and software mechanisms that opt to set tight timing thresholds.
+One potential solution is modifying the `icount` shift parameter, which informs the TCG's instruction counting feature for system emulation.
+When providing an icount shift of `N`, the virtual CPU will execute one instruction every `2^N` nanoseconds of virtual time, meaning that changing this parameter may allow emulation to "pass" tighter timing checks at the cost of less accurate timing overall.
+By default, an `icount` of 6 is used to align execution time with wall-clock time.
+
+### The TLB and PMP
+
+Hardware will often use a Translation Lookaside Buffer (TLB) as a high-speed cache to accelerate translation from virtual addresses to physical addresses.
+QEMU mimics this concept with its own address translation cache that it calls the [TLB](https://www.qemu.org/docs/master/devel/tcg.html#mmu-emulation), which it uses to speed up translation of guest virtual addresses to guest RAM or to the operation handlers for MMIO regions defined by emulated devices.
+
+It is worth being aware that partially or entirely invalidating the TLB (via mechanisms such as self-modifying flash code, or writes to PMP CSRs) can therefore decrease performance, as QEMU will temporarily lose the speedup from cached address translations.
+
+One particularly important note is that for RISC-V target emulation, 4 KiB pages are used for caching.
+When a PMP configuration is used that is not aligned to this page size, any address translation that falls within the misaligned region will not be cached nor make use of the TLB, due to the possibility of more granular PMP regions splitting a page with different permissions.
+As a result, any code that falls in a misaligned PMP region will be executed with a _significant_ slowdown compared to regular execution.
+
+Earlgrey's current PMP configurations tend to make use of fine-grained ranges for permissions to increase security posture, which means more potential for QEMU emulation to be slowed.
+In particular, Earlgrey's ROM & ROM_EXT text sections (containing the executable code) and owner firmware are all delineated using fine-grained PMP ranges, meaning code at the end (and start, if starting at a misaligned offset) will see performance decreases.
+The 4 byte stack guard placed at the top of stack may also cause issues when using all available stack.
+For any hot loops in the code, this can result in a substantial increase in emulation time.
+For example, the `sec_mmio` checks often cause large slowdowns in the ROM.
+
+### Host Performance and Parallelism
+
+As QEMU emulates by translating guest code to host instructions, the performance of QEMU will depend on the performance characteristics and processing load of the host device.
+Although QEMU has mechanisms for pacing, this can mean that different behaviors or outcomes are observed when executing QEMU on different host devices - particularly with regards to timing.
+One major advantage of QEMU is that tests can be executed in parallel, but doing so will also increase host processing load and may exacerbate timing differences.
+Hence, if the most reliable performance is required, instances of QEMU should be serially executed (e.g. passing `--test_output=streamed` or using `-j 1` for Bazel testing).
+
+Currently, large amounts of OpenTitan code are designed to only execute in FPGA or silicon environments, or sometimes in cycle-accurate simulation.
+This means that there may be segments of code which make implicit timing assumptions that no longer hold true when emulating in QEMU.
+As such, it may be required to conditionally change these timing assumptions for QEMU environments, or to modify the [`icount` parameter](#timing-and-cycle-accuracy) for a test to be able to meet timing checks.
+Where possible, the host and device should ideally synchronize without making any implicit timing assumptions.
+
+### Host Communication
+
+For communication between host and device (e.g. via UART, SPI, GPIO, etc.) QEMU allows for the creation of [CharDevs](https://www.qemu.org/docs/master/system/device-emulation.html).
+CharDevs are generic and flexible interfaces for communication, introducing an abstraction layer that supports a variety of protocols.
+Host-device communication in OpenTitanTool mainly uses PTYs (pseudo serial ports) and TCP sockets.
+
+Importantly, QEMU's lack of accurate cycle & timing emulation can limit the features that it is possible to emulate for these interfaces.
+For example, all UART communication goes via the CharDev and is not based on sampling at timing intervals, and so it is not possible to support the [UART oversampling](https://opentitan.org/book/hw/ip/uart/doc/theory_of_operation.html#reception) mechanism.
+
+### Flash Images and Bootstrapping
+
+There is sufficient Earlgrey emulation and support for QEMU in OpenTitanLib to run flows that require bootstrapping firmware.
+In spite of this, [slow emulation of the ROM](#the-tlb-and-pmp) means that we generally prefer to splice firmware into the initial flash image and avoid bootstrapping entirely.
+This avoids the delay of bootstrapping and additional boot times from resetting, which can compound to a large increase in execution time across multiple test runs.
+
+For software flows where it is genuinely desired to bootstrap firmware, appropriate [test parameters](#test-parameters) can be configured to use the bootstrapping flow instead, or a custom host harness or test command can be used.
+
 ## Testing
 
+From the perspective of OpenTitan testing, QEMU is treated as a separate set of execution environments (e.g. `sim_qemu_rom_ext`) which use the QEMU transport interface in OpenTitanLib, allowing communication between the the host Rust code and the emulated QEMU OpenTitan device.
+
 ### Test parameters
 
 Tests with a `sim_qemu_*` execution environment can be further configured by adding `qemu_params` to the test target.