gpdma: add GPDMA driver #64

astapleton · 2025-07-31T16:20:05Z

This adds support for one-shot DMA transfers for all supported transfer types with linear addressing. It does not yet support linked list buffer loading, or support 2D addressing.

Memory-to-memory transfers were quite thoroughly tested by the provided example, memory-to-peripheral and peripheral-to-memory transfers were tested with the SPI peripheral (working on a test branch). Peripheral-to-peripheral transfers are assumed to work given their similarity to memory-to-peripheral/peripheral-to-memory transfers, but will be properly tested at a later stage.

The driver includes helper structs for peripheral transfers in the gpdma::periph module which handle the common operations for setting up one-directional and full-duplex transfers using one, or two channels, respectively.

Transfer configuration is performed via DmaConfig. Additionally, the transform module includes support for configuring the data transformation pipeline supported by the GPDMA peripheral, only allowing relevant options to be supported based on the source and destination word sizes.

This adds support for one-shot DMA transfers for all supported transfer types with linear addressing. It does not yet support linked list buffer loading, or support 2D addressing. Memory-to-memory transfers were quite thoroughly tested by the provided example, memory-to-peripheral and peripheral-to-memory transfers were tested with the SPI peripheral (working on a test branch). Peripheral-to-peripheral transfers are assumed to work given their similarity to memory-to-peripheral/peripheral-to-memory transfers, but will be properly tested at a later stage. The driver includes helper structs for peripheral transfers in the gpdma::periph module which handle the common operations for setting up one-directional and full-duplex transfers using one, or two channels, respectively.

examples/dma.rs

src/gpdma/config/transform.rs

src/gpdma/ch.rs

- Use stack allocated buffers for dma example, - fix wait_for_transfer_complete/wait_for_half_transfer_complete - remove clock sync delays

examples/dma.rs

src/gpdma/config.rs

- Add defmt derives - improve consistency in return values for DataTransformBuilder - Removed side effects from check_transfer_event and added a separate function to clear events - Add drop implementation to abort a transfer - Modified wait_for_transfer_complete and wait_for_half_transform_complete to: - explicitly clear event flags - consume, but forget self, so the drop implementation is not called. - misc documentation improvements

usbalbin · 2025-08-02T08:06:45Z

examples/dma.rs

+    let src =
+        singleton!(: [u8; 40] = core::array::from_fn(|i| i as u8)).unwrap();
+    let dest = singleton!(: [u8; 40] = [0u8; 40]).unwrap();
+    let source_copy = unsafe { &*(src.as_ptr() as *const [u8; 40]) };


Why this? You have a src: &mut [u8; 40], take a pointer to it (*const [u8; 40]), deref that ([u8; 40]) and then take a reference to that &[u8; 40]? Is the point to get another reference to the buffer?

I think that since you now have both a shared and an exclusive reference to the buffer, we have UB. (Assuming I am not missing something)

Yeah, the point is to verify the contents of the buffer after the DMA transfer is complete. Requiring a static reference to the buffer in the DmaTransfer API and only getting a reference from the singleton macro, instead of a value, means I can't access the src reference to the buffer after passing it to the memory_to_memory function (the reference gets moved). It's UB, but IMO it is safe because the reference is not used until the DMA transfer is stopped. Is that incorrect?

This does reflect the pain of using this API now. It's really tricky to use without some sort of unsafe usage, and is why I initially avoided requiring static references to buffers in the first place. The alternative I've seen (e.g. stm32f4xx-hal) to this is to return the source & destination values (whether they be 'static references or otherwise when the transfer completes (from a free method or when wait_for_transfer_complete returns), but I generally don't like that API because it now forces users to use options to store these references in user code, or use unsafe themselves. Maybe that reflects the challenges of using DMA in general?

I've also been figuring out the peripheral API and using Futures based on top of this API, and those both become more painful with this change too. But, if you think that is a more sound API, I can try out the alternative above and see how that interacts with Future/peripheral usage.

You can see the WIP Futures and SPI DMA in #65 and #66. Would love feedback if you have ideas for improving/settling on an API!

While attempting to ensure the API is sound, and putting the onus on the user to ensure that their buffer use is sound seems like the appropriate approach, by requiring 'static source/destination seems appropriate, this example also demonstrates how user can easily get around the requirement to have static buffers, by doing this exact operation).

So perhaps the requirement for 'static lifetimes is actually counterproductive. The user can introduce UB by leaking DmaTransfer with the old API, or by copying references like I've done here. Dealer's choice. I felt encouraged to use this hack to meet the requirement to use a 'static lifetime, which feels like bad API design. Again, happy to learn how to do this in a safer way!

Leaking DmaTransfer seems like a much more deliberately degenerate thing to do, perhaps we could instead take the approach a similar approach embassy takes, and mark the transfer initialization functions as unsafe. We could then include a warning to not leak the DmaTransfer struct in a safety note.

@AdinAck have you had the opportunity to think about it any more since we spoke last in the G4 issue about DMA? Do you have any input here or just in general?

Also @astapleton, as to

It's UB, but IMO it is safe because the reference is not used until the DMA transfer is stopped. Is that incorrect?

Please correct if I am wrong but I am pretty sure that strictly speaking it is instant UB the moment the second reference is created . Due to the UB the resulting binary is just something that may or may not happen to seem to do the right thing or some thing completely different which may or may not change depending on compiler version/options/alignment of stars/nasal demons etc. Yet I am not at all sure what we are allowed to do instead for that case... Perhaps a raw pointer that is only turned into a reference after the DMA is done?

I have tried(with not so great results) to read up on some recent discussion on the subject of DMA, volatile read/write etc. Here is some info. With, I think, this being most relevant ('2' being the DMA case):

For (2), one has to put a suitable fence between the DMA accesses and the final MMIO access. This will, at least for now, require inline assembly: RISC-V, for instance, has separate fences for communicating with regular cached memory and with MMIO memory. The logical justification of the inline assembly block will be that it forks off a thread that asynchronously copies the DMA memory, modeling what the hardware does. So there's no new API required here from our side.

Then there is this stating

[...] asm block does not act as a compiler fence [...]

Then there is this, stating

[...] So, if your program contains no atomic accesses, but some atomic fences, those fences do nothing. [...]

And this comment

aliasing analysis with &mut can cause reordering of code because it knows there can't be any aliasing pointers.

That is true with all fences, none of them allows you to violate the aliasing assumptions.
If you are using volatile because that memory is shared with some other party (e.g. via MMIO), then you must also inform the type system of that sharing, by using shared references or raw pointers.

I am getting more confused the more I read :/

@AdinAck have you had the opportunity to think about it any more since we spoke last in the G4 issue about DMA? Do you have any input here or just in general?

I have! My plans are to begin implementing safe DMA in the next few weeks.

I don't have any code to share yet, nor any concrete decisions, but here are some thoughts...

I believe a DMA transfer can simply be split into two sides: the source and the destination. Either side can either hold a contiguous region of software memory, or a handle to a specific peripheral field.

The contract of these transfer endpoint types, is that the existence of the type implies an active transfer, so no need dealing with 'static lifetimes, as long as the transfer endpoint type exists it holds the necessary resources and performing a DMA transfer is sound.

A simple example of pseudo-code:

// I haven't decided yet whether the field handle(s) accepted should be moved or exclusively borrowed. // Conceivably, either works and it's merely a question of ergonomics, but I need to think more. let (src, dest) = Transfer::new(/* DMA/MUX required field entitlements */).split(p.cordic.rdata.res, [0u32; 2]);

Since the source endpoint owns the result field of the CORDIC, it rightfully has exclusive access to that MMIO. Similarly, since the destination endpoint owns the arbitrary buffer we gave it, it has exclusive access to the buffer.

Since DMA is a highly dynamic process, I don't think we should even try to represent the stage of the transfer with the type-system, as it is likely not static enough. Maybe as I learn more my opinion on this will change, but I doubt it.

So then, each endpoint has methods like .start(), .cancel(), .with(FnOnce(BufferAccess). The latter of which providing conditional access to a provided buffer only when the transfer is not active, and holding access to the field handles which control the transfer activity so a transfer cannot be started while the closure is running. BufferAccess would be a &mut ... if the endpoint was "source" and a & ... if the endpoint was "destination".

Related to what you guys were talking about with compiler fencing, I think the most reasonable solution would be to provide the buffer reference to this closure by read_volatileing the buffer, even though we could just reference it directly, because that forces the compiler to properly order the reads/writes. Otherwise, it would have no reason to make sure that "read status flag and make sure transfer is inactive" precedes "read src/dest buffer".

For a oneshot transfer, the transfer endpoints would be deconstructed:

let TransferStage::Complete(res) = src.release() else { ... }; let TransferStage::Complete(buf) = dest.release() else { ... };

So I'll be working on this in the coming weeks.

I am getting more confused the more I read :/

Just to reiterate, I believe the simplest solution is to make sure we use volatile read/writes to any in-memory buffers involved in a DMA transfer, because then we know the MMIO volatile access and the buffer volatile access will be in the correct order. Even though we have the buffer and could just reference it, according to all those resources you sent, the compiler could reorder volatile and non-volatile access. But I would consider -- since DMA is touching our memory -- access to our in-memory buffers is immediately now volatile memory.

core::mem::forget

💀 Well, that's quite the thorn in my side huh. I'll keep thinking but I'm pretty sure you're right and that destroys any and all invariances I would want to establish with the DMA transfer type.

Yeah, really hope they find some solution to the leak thing

On a lighter note, this section of the embedonomicon brought me some peace of mind. The usage of the compiler fence and memory barrier instructions (or lack thereof) is very clear. I was especially happy to read:

in the current implementation, volatiles happen to work just like relaxed atomic operations. There's work going on to actually guarantee this behavior for future versions of Rust.

So I will feel comfortable using compiler fences (and maybe a dsb just to be safe) to ensure the ordering of DMA operations is enforced.

I very much hope you are right but I think I read somewhere suggestions that it was not enough and that an asm block might be needed(again, I did not relly get it so I wont get into it much further). Perhaps somewhere here where there is also is discussion about dma writing to a &mut

core::mem::forget

💀 Well, that's quite the thorn in my side huh. I'll keep thinking but I'm pretty sure you're right and that destroys any and all invariances I would want to establish with the DMA transfer type.

Yeah. I had the same realization :)

src/gpdma.rs

Co-authored-by: Albin Hedman <[email protected]>

examples/dma.rs

usbalbin · 2025-08-06T17:38:52Z

BTW, I can see that you have really put a lot of work into this, thank you @astapleton ! :)

astapleton requested a review from usbalbin July 31, 2025 16:20

astapleton mentioned this pull request Jul 27, 2025

Peripheral overview #49

Open

astapleton force-pushed the as/gpdma branch from 27c1655 to f5cb2a7 Compare July 31, 2025 16:30