All other backends run at speed I expect but CUDA is so slow its like its running on oldest Pentium, maybe wrong compile flags or something?