Replies: 1 comment 1 reply
-
Known issue even for 3080 10GB, has sth to do with commit 67efee3 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
On a 4090 there's a noticeable delay between each image generation.
Instrumentation shows this delay is from just after:
samples_ddim = p.sample(conditioning=c, ...
x_samples_ddim = [decode_first_stage(p.sd_model, ...
in processing.py where I believe it is the case that x_samples_ddim is now back on the cpu for the remaining steps, which includes the save_image, until we are done and can start the next image generation.
I see perhaps a 7.5% improvement and that is with a fast image save on a Samsung 990 Pro. Furthermore, as inference improves(testing voltaML fast SD now), this ratio will only increase.
I propose having a second thread take the results from the GPU and do all the post processing there allowing the main thread to continue with the next batch. Obviously I'd need to be careful with synchronization. If I did this I'd probably improve the time reporting to include milliseconds given the 4090 and even better hardware in the future.
BTW, how the heck do you turn of the darn tqdm stuff to the console? The closer things get to 1 second per image the importance of watching the console show the progress becomes less. The GUI progress bar is ok. On the console I just want to see:
image 1: .879
image 2: .834
...
Ave time per image .851
Beta Was this translation helpful? Give feedback.
All reactions