Support running DMA from user provided buffer

# ARTIQ Feature Request

## Problem this request addresses

The host contains all the information needed to generate the content of a DMA trace. This allows the host to do all the heavy lifting while achieving maximum output throughput on the kernel with zero (re)compilation time. However, there's no builtin tools to run such a trace in a kernel.

## Describe the solution you'd like

I think it should be acceptable to keep this as an advanced feature and requires the user to use internal syscall's and keep up to sync with the internal dma buffer format. However, there are a few issues that make this currently difficult to do. Below are the few items that can make this significantly easier to support.

1. Provide a function to flush the proper cache.

    Currently I did this with a LLVM function doing volatile read in a loop, mimicking the behavior of `flush_l2_cache`. This should be as simple as exposing an api from the firmware.

2. Provide a way to get a pointer from a list/ndarray/bytes/bytearray.

    I don't know if there's a way to add a generic version but a type specific version is really easy to codegen in llvm. This should be doable as a syscall as well.

3. Provide a way for the user to allocate a properly aligned buffer.

    FWIW, I don't really understand why DMA buffer starting address needs to be 64 bytes aligned. The sizes of the entries in the buffer are not multiple of 64 (in fact it is always a odd number) so the DMA engine is clearly able to split out an instruction from an unaligned address (though I guess the read may be done with 64byte block size?). In any case, if the alignment requirement cannot be removed, this can probably be fixed with a data type with larger alignment. (Currently dealt with in my code by flipping max_target_alignment when emitting the rpc call I used to get the data from the host). Fixing this would probably need the biggest change in the compiler. OTOH, manual realignment of the data is also possible with a syscall function.

## Additional context

I have implemented all the features above in our experiment using very hacky monkey patches of the artiq compiler. It runs the dma buffer from the kernel stack memory just fine and improves the output throughput by ~10-20% on kasli compared to a custom serialization format.

Supporting this for subkernel would be nice too. However, I'm still on version 7 so I don't have much idea about the available ways to transfer data to the subkernels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support running DMA from user provided buffer #2697

ARTIQ Feature Request

Problem this request addresses

Describe the solution you'd like

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support running DMA from user provided buffer #2697

Description

ARTIQ Feature Request

Problem this request addresses

Describe the solution you'd like

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions