|  | 
|  | 1 | +# Getting Started: VAE Encode with Hybrid Inference | 
|  | 2 | + | 
|  | 3 | +VAE encode is used for training, image-to-image and image-to-video - turning into images or videos into latent representations. | 
|  | 4 | + | 
|  | 5 | +## Memory | 
|  | 6 | + | 
|  | 7 | +These tables demonstrate the VRAM requirements for VAE encode with SD v1 and SD XL on different GPUs. | 
|  | 8 | + | 
|  | 9 | +For the majority of these GPUs the memory usage % dictates other models (text encoders, UNet/Transformer) must be offloaded, or tiled encoding has to be used which increases time taken and impacts quality. | 
|  | 10 | + | 
|  | 11 | +<details><summary>SD v1.5</summary> | 
|  | 12 | + | 
|  | 13 | +| GPU                           | Resolution   |   Time (seconds) |   Memory (%) |   Tiled Time (secs) |   Tiled Memory (%) | | 
|  | 14 | +|:------------------------------|:-------------|-----------------:|-------------:|--------------------:|-------------------:| | 
|  | 15 | +| NVIDIA GeForce RTX 4090       | 512x512      |            0.015 |      3.51901 |               0.015 |            3.51901 | | 
|  | 16 | +| NVIDIA GeForce RTX 4090       | 256x256      |            0.004 |      1.3154  |               0.005 |            1.3154  | | 
|  | 17 | +| NVIDIA GeForce RTX 4090       | 2048x2048    |            0.402 |     47.1852  |               0.496 |            3.51901 | | 
|  | 18 | +| NVIDIA GeForce RTX 4090       | 1024x1024    |            0.078 |     12.2658  |               0.094 |            3.51901 | | 
|  | 19 | +| NVIDIA GeForce RTX 4080 SUPER | 512x512      |            0.023 |      5.30105 |               0.023 |            5.30105 | | 
|  | 20 | +| NVIDIA GeForce RTX 4080 SUPER | 256x256      |            0.006 |      1.98152 |               0.006 |            1.98152 | | 
|  | 21 | +| NVIDIA GeForce RTX 4080 SUPER | 2048x2048    |            0.574 |     71.08    |               0.656 |            5.30105 | | 
|  | 22 | +| NVIDIA GeForce RTX 4080 SUPER | 1024x1024    |            0.111 |     18.4772  |               0.14  |            5.30105 | | 
|  | 23 | +| NVIDIA GeForce RTX 3090       | 512x512      |            0.032 |      3.52782 |               0.032 |            3.52782 | | 
|  | 24 | +| NVIDIA GeForce RTX 3090       | 256x256      |            0.01  |      1.31869 |               0.009 |            1.31869 | | 
|  | 25 | +| NVIDIA GeForce RTX 3090       | 2048x2048    |            0.742 |     47.3033  |               0.954 |            3.52782 | | 
|  | 26 | +| NVIDIA GeForce RTX 3090       | 1024x1024    |            0.136 |     12.2965  |               0.207 |            3.52782 | | 
|  | 27 | +| NVIDIA GeForce RTX 3080       | 512x512      |            0.036 |      8.51761 |               0.036 |            8.51761 | | 
|  | 28 | +| NVIDIA GeForce RTX 3080       | 256x256      |            0.01  |      3.18387 |               0.01  |            3.18387 | | 
|  | 29 | +| NVIDIA GeForce RTX 3080       | 2048x2048    |            0.863 |     86.7424  |               1.191 |            8.51761 | | 
|  | 30 | +| NVIDIA GeForce RTX 3080       | 1024x1024    |            0.157 |     29.6888  |               0.227 |            8.51761 | | 
|  | 31 | +| NVIDIA GeForce RTX 3070       | 512x512      |            0.051 |     10.6941  |               0.051 |           10.6941  | | 
|  | 32 | +| NVIDIA GeForce RTX 3070       | 256x256      |            0.015 |      3.99743 |               0.015 |            3.99743 | | 
|  | 33 | +| NVIDIA GeForce RTX 3070       | 2048x2048    |            1.217 |     96.054   |               1.482 |           10.6941  | | 
|  | 34 | +| NVIDIA GeForce RTX 3070       | 1024x1024    |            0.223 |     37.2751  |               0.327 |           10.6941  | | 
|  | 35 | + | 
|  | 36 | + | 
|  | 37 | +</details> | 
|  | 38 | + | 
|  | 39 | +<details><summary>SDXL</summary> | 
|  | 40 | + | 
|  | 41 | +| GPU                           | Resolution   |   Time (seconds) |   Memory Consumed (%) |   Tiled Time (seconds) |   Tiled Memory (%) | | 
|  | 42 | +|:------------------------------|:-------------|-----------------:|----------------------:|-----------------------:|-------------------:| | 
|  | 43 | +| NVIDIA GeForce RTX 4090       | 512x512      |            0.029 |               4.95707 |                  0.029 |            4.95707 | | 
|  | 44 | +| NVIDIA GeForce RTX 4090       | 256x256      |            0.007 |               2.29666 |                  0.007 |            2.29666 | | 
|  | 45 | +| NVIDIA GeForce RTX 4090       | 2048x2048    |            0.873 |              66.3452  |                  0.863 |           15.5649  | | 
|  | 46 | +| NVIDIA GeForce RTX 4090       | 1024x1024    |            0.142 |              15.5479  |                  0.143 |           15.5479  | | 
|  | 47 | +| NVIDIA GeForce RTX 4080 SUPER | 512x512      |            0.044 |               7.46735 |                  0.044 |            7.46735 | | 
|  | 48 | +| NVIDIA GeForce RTX 4080 SUPER | 256x256      |            0.01  |               3.4597  |                  0.01  |            3.4597  | | 
|  | 49 | +| NVIDIA GeForce RTX 4080 SUPER | 2048x2048    |            1.317 |              87.1615  |                  1.291 |           23.447   | | 
|  | 50 | +| NVIDIA GeForce RTX 4080 SUPER | 1024x1024    |            0.213 |              23.4215  |                  0.214 |           23.4215  | | 
|  | 51 | +| NVIDIA GeForce RTX 3090       | 512x512      |            0.058 |               5.65638 |                  0.058 |            5.65638 | | 
|  | 52 | +| NVIDIA GeForce RTX 3090       | 256x256      |            0.016 |               2.45081 |                  0.016 |            2.45081 | | 
|  | 53 | +| NVIDIA GeForce RTX 3090       | 2048x2048    |            1.755 |              77.8239  |                  1.614 |           18.4193  | | 
|  | 54 | +| NVIDIA GeForce RTX 3090       | 1024x1024    |            0.265 |              18.4023  |                  0.265 |           18.4023  | | 
|  | 55 | +| NVIDIA GeForce RTX 3080       | 512x512      |            0.064 |              13.6568  |                  0.064 |           13.6568  | | 
|  | 56 | +| NVIDIA GeForce RTX 3080       | 256x256      |            0.018 |               5.91728 |                  0.018 |            5.91728 | | 
|  | 57 | +| NVIDIA GeForce RTX 3080       | 2048x2048    |          OOM     |             OOM       |                  1.866 |           44.4717  | | 
|  | 58 | +| NVIDIA GeForce RTX 3080       | 1024x1024    |            0.302 |              44.4308  |                  0.302 |           44.4308  | | 
|  | 59 | +| NVIDIA GeForce RTX 3070       | 512x512      |            0.093 |              17.1465  |                  0.093 |           17.1465  | | 
|  | 60 | +| NVIDIA GeForce RTX 3070       | 256x256      |            0.025 |               7.42931 |                  0.026 |            7.42931 | | 
|  | 61 | +| NVIDIA GeForce RTX 3070       | 2048x2048    |          OOM     |             OOM       |                  2.674 |           55.8355  | | 
|  | 62 | +| NVIDIA GeForce RTX 3070       | 1024x1024    |            0.443 |              55.7841  |                  0.443 |           55.7841  | | 
|  | 63 | + | 
|  | 64 | +</details> | 
|  | 65 | + | 
|  | 66 | +## Available VAEs | 
|  | 67 | + | 
|  | 68 | +|   | **Endpoint** | **Model** | | 
|  | 69 | +|:-:|:-----------:|:--------:| | 
|  | 70 | +| **Stable Diffusion v1** | [https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud](https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud) | [`stabilityai/sd-vae-ft-mse`](https://hf.co/stabilityai/sd-vae-ft-mse) | | 
|  | 71 | +| **Stable Diffusion XL** | [https://xjqqhmyn62rog84g.us-east-1.aws.endpoints.huggingface.cloud](https://xjqqhmyn62rog84g.us-east-1.aws.endpoints.huggingface.cloud) | [`madebyollin/sdxl-vae-fp16-fix`](https://hf.co/madebyollin/sdxl-vae-fp16-fix) | | 
|  | 72 | +| **Flux** | [https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud](https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud) | [`black-forest-labs/FLUX.1-schnell`](https://hf.co/black-forest-labs/FLUX.1-schnell) | | 
|  | 73 | + | 
|  | 74 | + | 
|  | 75 | +> [!TIP] | 
|  | 76 | +> Model support can be requested [here](https://github.com/huggingface/diffusers/issues/new?template=remote-vae-pilot-feedback.yml). | 
|  | 77 | +
 | 
|  | 78 | + | 
|  | 79 | +## Code | 
|  | 80 | + | 
|  | 81 | +> [!TIP] | 
|  | 82 | +> Install `diffusers` from `main` to run the code: `pip install git+https://github.com/huggingface/diffusers@main` | 
|  | 83 | +
 | 
|  | 84 | + | 
|  | 85 | +A helper method simplifies interacting with Hybrid Inference. | 
|  | 86 | + | 
|  | 87 | +```python | 
|  | 88 | +from diffusers.utils.remote_utils import remote_encode | 
|  | 89 | +``` | 
|  | 90 | + | 
|  | 91 | +### Basic example | 
|  | 92 | + | 
|  | 93 | +Let's encode an image, then decode it to demonstrate. | 
|  | 94 | + | 
|  | 95 | +<figure class="image flex flex-col items-center justify-center text-center m-0 w-full"> | 
|  | 96 | +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"/> | 
|  | 97 | +</figure> | 
|  | 98 | + | 
|  | 99 | +<details><summary>Code</summary> | 
|  | 100 | + | 
|  | 101 | +```python | 
|  | 102 | +from diffusers.utils import load_image | 
|  | 103 | +from diffusers.utils.remote_utils import remote_decode | 
|  | 104 | + | 
|  | 105 | +image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg?download=true") | 
|  | 106 | + | 
|  | 107 | +latent = remote_encode( | 
|  | 108 | +    endpoint="https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud/", | 
|  | 109 | +    scaling_factor=0.3611, | 
|  | 110 | +    shift_factor=0.1159, | 
|  | 111 | +) | 
|  | 112 | + | 
|  | 113 | +decoded = remote_decode( | 
|  | 114 | +    endpoint="https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud/", | 
|  | 115 | +    tensor=latent, | 
|  | 116 | +    scaling_factor=0.3611, | 
|  | 117 | +    shift_factor=0.1159, | 
|  | 118 | +) | 
|  | 119 | +``` | 
|  | 120 | + | 
|  | 121 | +</details> | 
|  | 122 | + | 
|  | 123 | +<figure class="image flex flex-col items-center justify-center text-center m-0 w-full"> | 
|  | 124 | +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/remote_vae/decoded.png"/> | 
|  | 125 | +</figure> | 
|  | 126 | + | 
|  | 127 | + | 
|  | 128 | +### Generation | 
|  | 129 | + | 
|  | 130 | +Now let's look at a generation example, we'll encode the image, generate then remotely decode too! | 
|  | 131 | + | 
|  | 132 | +<details><summary>Code</summary> | 
|  | 133 | + | 
|  | 134 | +```python | 
|  | 135 | +import torch | 
|  | 136 | +from diffusers import StableDiffusionImg2ImgPipeline | 
|  | 137 | +from diffusers.utils import load_image | 
|  | 138 | +from diffusers.utils.remote_utils import remote_decode, remote_encode | 
|  | 139 | + | 
|  | 140 | +pipe = StableDiffusionImg2ImgPipeline.from_pretrained( | 
|  | 141 | +    "stable-diffusion-v1-5/stable-diffusion-v1-5", | 
|  | 142 | +    torch_dtype=torch.float16, | 
|  | 143 | +    variant="fp16", | 
|  | 144 | +    vae=None, | 
|  | 145 | +).to("cuda") | 
|  | 146 | + | 
|  | 147 | +init_image = load_image( | 
|  | 148 | +    "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" | 
|  | 149 | +) | 
|  | 150 | +init_image = init_image.resize((768, 512)) | 
|  | 151 | + | 
|  | 152 | +init_latent = remote_encode( | 
|  | 153 | +    endpoint="https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud/", | 
|  | 154 | +    image=init_image, | 
|  | 155 | +    scaling_factor=0.18215, | 
|  | 156 | +) | 
|  | 157 | + | 
|  | 158 | +prompt = "A fantasy landscape, trending on artstation" | 
|  | 159 | +latent = pipe( | 
|  | 160 | +    prompt=prompt, | 
|  | 161 | +    image=init_latent, | 
|  | 162 | +    strength=0.75, | 
|  | 163 | +    output_type="latent", | 
|  | 164 | +).images | 
|  | 165 | + | 
|  | 166 | +image = remote_decode( | 
|  | 167 | +    endpoint="https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud/", | 
|  | 168 | +    tensor=latent, | 
|  | 169 | +    scaling_factor=0.18215, | 
|  | 170 | +) | 
|  | 171 | +image.save("fantasy_landscape.jpg") | 
|  | 172 | +``` | 
|  | 173 | + | 
|  | 174 | +</details> | 
|  | 175 | + | 
|  | 176 | +<figure class="image flex flex-col items-center justify-center text-center m-0 w-full"> | 
|  | 177 | +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/remote_vae/fantasy_landscape.png"/> | 
|  | 178 | +</figure> | 
|  | 179 | + | 
|  | 180 | +## Integrations | 
|  | 181 | + | 
|  | 182 | +* **[SD.Next](https://github.com/vladmandic/sdnext):** All-in-one UI with direct supports Hybrid Inference. | 
|  | 183 | +* **[ComfyUI-HFRemoteVae](https://github.com/kijai/ComfyUI-HFRemoteVae):** ComfyUI node for Hybrid Inference. | 
0 commit comments