-
Notifications
You must be signed in to change notification settings - Fork 228
Description
It would be nice to have full-fledged GPU support in Turing for cases where parts of the model are embarrassingly parallel. Essentially, this can be achieved using GPU arrays for parameters and/or data. Since we allow arbitrary Julia code in Turing models, this is already largely possible for all the bits except the ~
lines, i.e. VarInfo
, observe
and assume
. Figuring out ways to adapt these components for the GPU is not trivial but may not be too hard.
For a start, GPU parallelism can be allowed in the ~
lines using GPU multivariate distributions. Complex kernels formed by the user may also be possible with things like map
-do
blocks but we may run into issues with closing over other arrays. I will need to try it out with CuArrays
and see how things work. I think data-parallelism may be easier to begin with since the data is not coupled with the complicated VarInfo
.
Another mode of GPU parallelism that can be exploited in Turing is in the sampling itself, so each GPU thread can do its own little MCMC sampling. This can also be possible by adapting the VarInfo
and data input to work on the GPU. This may be a bigger effort though since it may require implementing the MCMC algorithms in a GPU-friendly way using GPU kernels.
This is a brainstorming issue on GPU support for Turing. So papers, ideas, comments and use cases are welcome.