High VRAM cost

On a 3090 (24GB vram) I can run a batch size of 30 for SDXL (1280x720) but only 12 with TGATE enabled (33 steps, gate_step=10). I'm using diffusers and running in a text console without a GUI loaded so all VRAM is available. Is this high VRAM cost for TGATE expected?


```
 NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:07:00.0  On |                  N/A |
| 54%   64C    P0            298W /  300W |   23642MiB /  24576MiB |    100%   E. Process |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            8392      C   python3                               23624MiB 
```

I'm seeing 30% speedup. Quality wise: there are a few more unusable images. The output has more high frequency details so things like hair is improved but in a noisy high frequency output, problems and noise (caused by the model) are highlighted making it less likely that the output is acceptable. I could try prompting for bokeh or using clip skip to reduce sharpness.

I wonder if SDXL or other models could be finetuned for TGATE to reduce the VRAM cost?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High VRAM cost #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

High VRAM cost #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions