Skip to content

High VRAM cost #24

@qdacsvx

Description

@qdacsvx

On a 3090 (24GB vram) I can run a batch size of 30 for SDXL (1280x720) but only 12 with TGATE enabled (33 steps, gate_step=10). I'm using diffusers and running in a text console without a GUI loaded so all VRAM is available. Is this high VRAM cost for TGATE expected?

 NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:07:00.0  On |                  N/A |
| 54%   64C    P0            298W /  300W |   23642MiB /  24576MiB |    100%   E. Process |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            8392      C   python3                               23624MiB 

I'm seeing 30% speedup. Quality wise: there are a few more unusable images. The output has more high frequency details so things like hair is improved but in a noisy high frequency output, problems and noise (caused by the model) are highlighted making it less likely that the output is acceptable. I could try prompting for bokeh or using clip skip to reduce sharpness.

I wonder if SDXL or other models could be finetuned for TGATE to reduce the VRAM cost?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions