Hi, I am very interested in your work on PipeInfer!
However, the current implementation does not seem to support multiple GPUs. Are there any upcoming plans or suggestions for integrating support for GPUs with pipeline speculative decoding?
I have experimented with various approaches, but so far, none of them can work for me.
Thanks a lot!