-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hi @psteinb & @StephanPreibisch,
I'm using these CUDA implementations now for a while (thank you!), but I figured that there seems to be something really strange going on concerning the memory footprint of the implementation.
I'm using a Titan Black X with 12GB Memory (single memory, not two graphic cards or stuff) and compiled your code for CUDA 7. I see the same behaviour on linux and windows.
I figured, that I can easily convolve a float image of size 400^3 (which is length=64000000 and ~244MB) with a kernel of size (31x31x91) in-place.
However, an image of size 406_400_400 and same kernel size (31x31x91) already fails. By fails I mean, that the array I passed over to CUDA to perform the in-place convolution nearly only contains zeros after convolution. Some values are very close to zero.
I know that the memory footprint FFT is high, however, not over 24 times as high as my input image size?
Any ideas??
Thanks!
Christian