Replies: 2 comments 5 replies
-
It's hard to answer this without more details about the way you are implementing this. The backends have full control over how their memory is allocated, so the way your are formulating the question makes me think that you are modifying the CPU backend instead of creating a new backend, which would not be the recommended way to do this. |
Beta Was this translation helpful? Give feedback.
-
hi, i want to do the same thing, but now i wonder that whether llama.cpp can be implement in the bare metal environment of FPGA, and i also want to change the memory mappping weight path into DMA way, could you please give me some advise? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
My overall goals to run LLM inference with FPGA-based accelerators using llama.cpp.
Currently I have managed to achieve this, but there is a bottleneck when moving the weight data from normal memory (lpp allocated data struct) to mmapped buffers allocated for DMA data transfers.
My idea is if we could implement way to load the entire or parts of the LLM model into the DMA buffers I could potential make great performance gains when accelerating using DMA stream-based accelerators.
I know currently there is an option to mmap the model but it seems like there is no way specify the physical address I want to associate with the mmapped buffers.
Any advice/ directions on where to look at the code would be helpful.
Also let me know if there is any details I should add to make the discuss more fruitful.
Beta Was this translation helpful? Give feedback.
All reactions