-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Description
This is an extension of the issue already introduced in #36471
An implementation of the RFC can be found in #57602
The problem
In memory bus architectures, CPU running software is one of the masters accessing the bus, while DMA engines are other masters accessing the same (or other, but connected) bus.
In basic MCUs there is a simple bus architecture, where CPU and DMA engines can access the whole memory range, no caches are present, and memory addressing is consistent between all the bus masters.
The memory access in these devices can be summarized with the following rules:
- CPU and DMA have access to the whole memory space
- memory address used by CPU to point to any memory location A is equal to the memory address used by DMA to point to the memory location A
- any data stored in memory by DMA is immediately visible to the CPU
- any data stored in memory by the CPU is immediately visible to DMA
In more complex systems with a more complex memory architecture the interaction between CPU and DMA, or between multiple CPUs, can be complicated:
- DMA can be restricted to access only part of the system memory (or only part of the memory can be accessed efficiently by DMA)
- CPU may need to translate its buffer address (virtual or physical) to an address (or a set of addresses if the buffer is not continuous in the bus memory space) usable by DMA before passing it to a DMA engine (to take into account cache line alignment, memory region allocation, etc...)
- CPU cache can contain stale data and CPU may need to invalidate cache to read the data that DMA or other CPU updated in the memory
- Data intended by a CPU to be stored in memory can get stuck in a CPU's write-back cache and be invisible to DMA or other CPUs so the CPU has to flush caches before informing DMA or other CPUs about data availability in the memory
All of the discussed challenges must be addressed by the software running in a system with a complex memory architecture.
What Zephyr is doing about this
Zephyr has no solution yet for this kind of complex platforms. There are several scattered attempts to overcome these limitations:
- ARM-M7 (especially STM32H7) tries to make DMA and d-cache co-existing peacefully by entirely disabling d-cache or by allocating DMA buffers in non-cacheable memory (see [RFC] DMA and Data cache coherency on arm M7 devices #36471).
- NRFX UARTE driver (and probably others) is statically allocating bounce buffers allocated using DT (see dts: bindings: Add memory-region property #45142) in a DMA-able memory region.
- The DMA subsystem (in https://github.com/zephyrproject-rtos/zephyr/tree/main/drivers/dma) is really lacking any support for this. It is more aimed at supporting DMA hardware devices than aiming at buffers and cache management.
- The Memory Management drivers subsystem (in https://github.com/zephyrproject-rtos/zephyr/tree/main/drivers/mm) is a niche intel-owned subsystem that is mainly dealing with page management.
Proposal
I'm proposing to add a new sub-system called DMM - DMA Memory Management with the following responsabilities:
- allocating and freeing "bounce buffers" if the buffer provided by a driver user cannot be used by the DMA engine (memory accessible by DMA, aligned to DMA requirements, if cacheable: aligned and padded to cache lines),
- copying data to and from the bounce buffers
- translation of CPU buffer addresses (virtual or physical) to DMA-usable addresses
- cache management if buffers are allocated in cacheable memory
Why this RFC?
Because before starting to write the code I want to have an opinion about this and to discuss whether the introduction of a new subsystem makes sense vs expanding the current APIs (DMA or MM).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status