While it did not show up in my tests, it seems that alloca might cause a slow down although it reduces VRAM usage.
One solution is to make alloca an option and generate kernels with and without alloca. To do so we need to move alloca from a define to a template parameter.
Details: mind-inria/mri-nufft-benchmark#5
@paquiteau @Lenoush