Replies: 3 comments 1 reply
-
Indeed it works! Enabled it, got from ~6it/s to 6.4 it/s on 512x640 generation. --xformers turned on. Interesting thing is, i didn't get those pauses like you described, or they were too small to notice. |
Beta Was this translation helpful? Give feedback.
-
i'm impressed how this passed unobserved. |
Beta Was this translation helpful? Give feedback.
-
3090: first run was slower by 450%, subsequent runs took the same time as before |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
(posted this as an issue originally but probably would be better as discussion, hopefully might help more people see it)
With my GTX 1080 it seems enabling cuDNN benchmarking reliably gives a small speed boost, training goes from ~1.27s/it to ~1.05s/it (saving about 2 hours on the training estimate), while txt2img went from 5.30s/it to 5.07s/it.
From others that tested it I've been hearing good things as well, across pretty much all NV GPUs, seems it can give around 10-25% improvement.
To enable it I just edited
modules/sd_models.py
, and underneathdef setup_model():
added:Like so:

One issue with this is that the first txt2img run seems to take a little while to start up, and then also stays stuck at 100% for a little longer too, nullifying the speed boost from the reduced s/it...
That only seems to happen on the first txt2img run though, any runs after that don't seem to have that issue (and "time taken" becomes ~30 seconds faster than without cuDNN)
I guess this is because of cuDNN benchmarking each new action being made for the first time, but not sure.
Would be happy to hear how it works for others too - though make sure to ignore the first run/generation if you do try it out (since it's using that to benchmark/test different algos for your HW), the runs after that should then hopefully be an improvement over the original webui.
(I think changing image size & other params might also cause it to rebench as well, but not sure what exactly can cause it...)
Wonder if there's some way to work around the first-run-benchmarking slowdown, maybe some way to call the img2txt pytorch operations during startup so cuDNN could benchmark them (or just calling txt2img directly during startup with a single iteration?)
E: you can get it to run txt2img during startup by editing
modules/ui.py
, and underneathsd_hijack.model_hijack.embedding_db.load_textual_inversion_embeddings()
(line ~1245) add:This cuts down the first-run time to be a little faster than vanillas first run time, though for me second run onwards still seems to have a slight speed boost (but looks like that happens even with cuDNN disabled...)
Beta Was this translation helpful? Give feedback.
All reactions