We're working on the jpeg encoder and we're wondering what's that fastest known way to read a storage buffer from the GPU and bring it to system RAM. If I understand GPUQueue.read_buffer correctly, the copy step is not ideal and also a new buffer is allocated on system RAM every time you call this function. I guess it would be faster if you always have a specific system memory location you're writing to when you're downloading new data from the GPU?
For context this buffer contains the run-length encoded data.
@apasarkar