I did an interesting memory usage study. #6550

aifartist · 2023-01-09T07:47:07Z

aifartist
Jan 9, 2023

Bug 5409 complains about the performance regression due to commit https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/67efee33a6c65e58b3f6c788993d0e68a33e4fd0
I've conducted a detail max tensor memory used analysis at different batch sizes for both the original code and the new code(which is slower). First the current state with the fix. I show the batch size, the max memory used for p.sample and decode_first_stage and the per image generation time. After the '=" I also show the times for the sample, the decode_first_stage, and misc. time. Again they are "per image times". The change that was made affected decode_first_stage().

     sample decode    time     samp   decode     misc
 1   2.810  3.039    1.702 =  1.438 +  0.095 +  0.168
 2   2.917  3.041    1.312 =  1.022 +  0.084 +  0.206
 3   3.593  3.043    1.129 =  0.848 +  0.085 +  0.196
 4   3.640  3.044    1.040 =  0.747 +  0.087 +  0.205
 5   3.694  3.046    0.934 =  0.645 +  0.088 +  0.201
...
15   4.956  3.060    0.678 =  0.430 +  0.088 +  0.161
16   5.006  3.062    0.649 =  0.408 +  0.087 +  0.153
17   5.833  3.080    0.737 =  0.472 +  0.088 +  0.177
18  18.140  3.080    0.774 =  0.520 +  0.088 +  0.166
19  18.207  3.079    0.798 =  0.537 +  0.088 +  0.173

As can be seen the memory use for the decode stays low. But it is slower than without the fix. NOTE the anomaly with sample() memory use at batch size 18 and higher. The ?same? anomaly occurs for the decode without the fix at a lower batch size which is why the fix was done.

     sample decode    time     samp   decode     misc
 1   2.810  3.039    1.697 =  1.437 +  0.089 +  0.171
 2   2.917  3.444    1.310 =  1.016 +  0.089 +  0.205
 3   3.593  3.848    1.117 =  0.848 +  0.085 +  0.185
 4   3.640 13.381    1.072 =  0.752 +  0.102 +  0.218
 5   3.694 13.651    0.936 =  0.645 +  0.093 +  0.198
 6   3.742 13.921    0.889 =  0.608 +  0.088 +  0.193
...
16   5.006 14.777    0.618 =  0.408 +  0.051 +  0.158
17   5.833 14.633    0.723 =  0.472 +  0.078 +  0.173
18  18.140 14.767    0.755 =  0.522 +  0.068 +  0.165
19  18.207 14.901    0.766 =  0.538 +  0.066 +  0.163

As can be seen at batch size 4 the decode memory use skyrockets.
This is why the change was done although I'm unclear whether it was understood they could have done divided up the work into 3 images at a time and not had a problem. Of course, without testing 768x768 models and some other models some more investigation is needed to do it right.

I did debug down to ?root cause? the huge increase in memory usage for p.sample(). It occurs in conv.py:_conv_forward- F.conv2d().
I captured the max_allocate_memory before and after this call and there was a huge jump for the larger batch sizes. So far I've found:
https://discuss.pytorch.org/t/memory-usage-suddenly-increase-with-specific-input-shape-on-torch-nn-conv2d/99681
A 6X jump in memory usage is hardly justified for just one more image in the batch. I tried both torch.backends.cudnn.benchmark and torch.backends.cudnn.deterministic without luck. But I don't know what I'm doing so there may be more to restricting cudnn from switching algorithms to do the conv2d because it thinks it will be more efficient. OOM'ing is never efficient! :-)

ice051128 · 2023-01-09T09:27:26Z

ice051128
Jan 9, 2023

Conclusion and solution?

1 reply

aifartist Jan 9, 2023
Author

Instead of a conclusion or solution(now) the game plan is:
Find the pytorch source code if available. See how the "benchmark" and "deterministic" options are handled to see if those would fix the problem. If that doesn't work create a pytorch.org account so I can post there. See if a 6X memory increase is really intended for a small change in the input shape and if there are any other ways to control this. If that doesn't work (idea) add a '--hivram' options to indicate that you have a lot of memory and we should do whatever achieves max performance. That option would revert to the old way that decode_first_stage() was done.

And in any case make sure to test:

768x768 models. I didn't post results but saw evidence that the growth was different so I want to confirm and understand it.
Largish 1536x1536, 2048x2048 images for any model
Upscaling? I'm still new with this tech so I'm not sure what some things do or which of the ?post image processing" might also use a lot of memory.

Is this reasonable? I'm just getting started for the day.

aifartist · 2023-01-10T00:35:14Z

aifartist
Jan 10, 2023
Author

Still researching. Posted something about this to reddit r/pytorch. Look like others are wanting the same thing: https://discuss.pytorch.org/t/choose-a-different-conv-algorithm/27518
specifically to reduce memory use. This question from Oct 2018 hasn't been replied to.
There was a suggestion of changing the stride and adjusting the padding but I don't really know what this means or if it did work.

0 replies

aifartist · 2023-01-10T00:59:57Z

aifartist
Jan 10, 2023
Author

torch.cuda.set_per_process_memory_fraction() can control how much is used. I found this in the torch code for selecting an algorithm. I tried it and it reduced the max used for a batch size of 18. But because I had set a hard limit I couldn't use batch size like 60 without OOM'ing which might need more memory when using a memory efficient algorithm. Thus it might be inflexible.

1 reply

Acephalia Sep 3, 2023

@aifartist sorry to dig up an old thread but I’m trying to figure out how to get my 24GB card to be utilised in full. Currently PyTorch is only allocating half.

Could I please ask how you set the set_per_process_memory_fraction ? Was it via the bat file arguments? Some help would be greatly appreciated. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I did an interesting memory usage study. #6550

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

I did an interesting memory usage study. #6550

Uh oh!

aifartist Jan 9, 2023

Replies: 3 comments · 2 replies

Uh oh!

ice051128 Jan 9, 2023

Uh oh!

Uh oh!

aifartist Jan 9, 2023 Author

Uh oh!

aifartist Jan 10, 2023 Author

Uh oh!

aifartist Jan 10, 2023 Author

Uh oh!

Acephalia Sep 3, 2023

aifartist
Jan 9, 2023

Replies: 3 comments 2 replies

ice051128
Jan 9, 2023

aifartist Jan 9, 2023
Author

aifartist
Jan 10, 2023
Author

aifartist
Jan 10, 2023
Author