-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Often it is important to minimize startup time rather than maximize runtime performance. Right now, descrypt-opencl always(?) uses per-salt kernels, which typically take up to ~2 hours to build from source and up to tens of minutes to "build from binary". It should be possible to request use a generic kernel instead - a kernel that would run slower (needing pointer indirection for the salts), but would only be built once. We used to have that, but lost it since. We should reintroduce it as an option (maybe even as the default), and should print a "Note: ..." to the user indicating how to enable the other behavior.
IIRC, previously the HARDCODE_SALT setting controlled this, but now it's at 0 yet we do hard-code salts into the multiple kernels.
Now there's also the USE_BASIC_KERNEL setting, currently only enabled when the device is a CPU, or on OS X. I thought that maybe this was what we needed. I tried setting "--device" to a CPU to test it, but first I got unrealistically good speeds for a CPU (200M+ c/s on a machine that only does under 80M with C+intrinsics) with AMD OpenCL and a segfault with Intel's. Restarting these, I also got an instant segfault with AMD's. So this is actually unreliable at least with those (outdated versions of) OpenCL backends. Trying it on NVIDIA GPU (by forcing "#define USE_BASIC_KERNEL 1" in the source), I got:
OpenCL CL_INVALID_KERNEL_ARGS error in opencl_DES_bs_b_plug.c:677 - Enque kernel DES_bs_25 failed.
I'd rather leave (re)implementing this to someone familiar with the code. ;-)