-
Notifications
You must be signed in to change notification settings - Fork 2.4k
DEScrypt-opencl: Drop support for parallel kernel build #5902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEScrypt-opencl: Drop support for parallel kernel build #5902
Conversation
solardiz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure about dropping this, but I don't mind. Just fix "we'd did" in the comment.
|
Oh, we can keep it. But back then we found that the (then current) runtimes seemed to serialize builds despite this! And currently we get a segfault because (perhaps?) of this (opencl_common.c:1270): * Saving and restoring of the current directory here is incompatible with
* concurrent kernel builds by multiple threads, like we'd do with the
* PARALLEL_BUILD setting in descrypt-opencl (currently disabled and considered
* unsupported). We'd probably need to save and restore the directory
* before/after all kernel builds, not before/after each.I could try to repair it instead. It is a great feature, if it works. |
dde5bd3 to
692eea0
Compare
It was disabled by default but always lead to problems when I experimented with it. Closes openwall#1618
692eea0 to
5e02c6b
Compare
You broke it with 753dd9f, not quite sure how or how to easily fix it. I can get it running for a while and it doesn't seem any faster (on nvidia) than single thread. I do see several threads but they are far from 100% CPU each. OTOH it's not many lines of code so we could let it be. |
I think when we're in the |
|
OK this is from run directory So it builds over 650 kernels and only then fails on one, in a way that looks like corrupted kernel source. This is with 24 threads. I'll rebuild it with |
There's no point, the crash is in the nvidia runtime. Meanwhile, I was lucky enough to build all 4096 kernels in parallel with no error.
We don't get a good RoI from our 24 threads, just 50% faster. Building JtR itself is 21x faster, that's more like it. An interesting data point is a parallel build of cached binaries takes twice as long as with a single thread. And it runs with %CUC at ~100%. So the threading does absolutely no good in that case, but it's surprising it hits performance that bad! On a side note I recall 4096 kernels took four hours or so years ago, with earlier hardware. 32 minutes is almost bearable. And cached binaries makes it even more bearable. On another side note we could want Bottom line is I give up on this for a few years or so again. OTOH it's really not important to get rid of this code (it's not a lot anyway) so I'm closing this PR. BTW since some time now, we can use |
Please add this note to #2666, and can we make this feature more apparent to users? Maybe even the default when there are many salts loaded or many missing per-salt kernels? |
It was disabled by default but always lead to problems when I experimented with it. Closes #1618