Skip to content
This repository was archived by the owner on Jan 18, 2026. It is now read-only.

Support running DMA from user provided buffer #2697

@yuyichao

Description

@yuyichao

ARTIQ Feature Request

Problem this request addresses

The host contains all the information needed to generate the content of a DMA trace. This allows the host to do all the heavy lifting while achieving maximum output throughput on the kernel with zero (re)compilation time. However, there's no builtin tools to run such a trace in a kernel.

Describe the solution you'd like

I think it should be acceptable to keep this as an advanced feature and requires the user to use internal syscall's and keep up to sync with the internal dma buffer format. However, there are a few issues that make this currently difficult to do. Below are the few items that can make this significantly easier to support.

  1. Provide a function to flush the proper cache.

    Currently I did this with a LLVM function doing volatile read in a loop, mimicking the behavior of flush_l2_cache. This should be as simple as exposing an api from the firmware.

  2. Provide a way to get a pointer from a list/ndarray/bytes/bytearray.

    I don't know if there's a way to add a generic version but a type specific version is really easy to codegen in llvm. This should be doable as a syscall as well.

  3. Provide a way for the user to allocate a properly aligned buffer.

    FWIW, I don't really understand why DMA buffer starting address needs to be 64 bytes aligned. The sizes of the entries in the buffer are not multiple of 64 (in fact it is always a odd number) so the DMA engine is clearly able to split out an instruction from an unaligned address (though I guess the read may be done with 64byte block size?). In any case, if the alignment requirement cannot be removed, this can probably be fixed with a data type with larger alignment. (Currently dealt with in my code by flipping max_target_alignment when emitting the rpc call I used to get the data from the host). Fixing this would probably need the biggest change in the compiler. OTOH, manual realignment of the data is also possible with a syscall function.

Additional context

I have implemented all the features above in our experiment using very hacky monkey patches of the artiq compiler. It runs the dma buffer from the kernel stack memory just fine and improves the output throughput by ~10-20% on kasli compared to a custom serialization format.

Supporting this for subkernel would be nice too. However, I'm still on version 7 so I don't have much idea about the available ways to transfer data to the subkernels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions