I'm experiencing a weird bug where the kernel I wrote only gives the expected result when I add a print statement in a particular place:
https://github.com/SouthEndMusic/SplineGrids.jl/blob/08f0e91a70c547e807b0a8bd89058b4cb8c06a43/src/spline_dimension.jl#L48-L70
It was suggested on Slack that it could be a synchronization issue, but the different threads are completely independent from eachother.