|  | 
|  | 1 | +# 入门:使用混合推理进行 VAE 编码 | 
|  | 2 | + | 
|  | 3 | +VAE 编码用于训练、图像到图像和图像到视频——将图像或视频转换为潜在表示。 | 
|  | 4 | + | 
|  | 5 | +## 内存 | 
|  | 6 | + | 
|  | 7 | +这些表格展示了在不同 GPU 上使用 SD v1 和 SD XL 进行 VAE 编码的 VRAM 需求。 | 
|  | 8 | + | 
|  | 9 | +对于这些 GPU 中的大多数,内存使用百分比决定了其他模型(文本编码器、UNet/Transformer)必须被卸载,或者必须使用分块编码,这会增加时间并影响质量。 | 
|  | 10 | + | 
|  | 11 | +<details><summary>SD v1.5</summary> | 
|  | 12 | + | 
|  | 13 | +| GPU                           | 分辨率   |   时间(秒) |   内存(%) |   分块时间(秒) |   分块内存(%) | | 
|  | 14 | +|:------------------------------|:-------------|-----------------:|-------------:|--------------------:|-------------------:| | 
|  | 15 | +| NVIDIA GeForce RTX 4090       | 512x512      |            0.015 |      3.51901 |               0.015 |            3.51901 | | 
|  | 16 | +| NVIDIA GeForce RTX 4090       | 256x256      |            0.004 |      1.3154  |               0.005 |            1.3154  | | 
|  | 17 | +| NVIDIA GeForce RTX 4090       | 2048x2048    |            0.402 |     47.1852  |               0.496 |            3.51901 | | 
|  | 18 | +| NVIDIA GeForce RTX 4090       | 1024x1024    |            0.078 |     12.2658  |               0.094 |            3.51901 | | 
|  | 19 | +| NVIDIA GeForce RTX 4080 SUPER | 512x512      |            0.023 |      5.30105 |               0.023 |            5.30105 | | 
|  | 20 | +| NVIDIA GeForce RTX 4080 SUPER | 256x256      |            0.006 |      1.98152 |               0.006 |            1.98152 | | 
|  | 21 | +| NVIDIA GeForce RTX 4080 SUPER | 2048x2048    |            0.574 |     71.08    |               0.656 |            5.30105 | | 
|  | 22 | +| NVIDIA GeForce RTX 4080 SUPER | 1024x1024    |            0.111 |     18.4772  |               0.14  |            5.30105 | | 
|  | 23 | +| NVIDIA GeForce RTX 3090       | 512x512      |            0.032 |      3.52782 |               0.032 |            3.52782 | | 
|  | 24 | +| NVIDIA GeForce RTX 3090       | 256x256      |            0.01  |      1.31869 |               0.009 |            1.31869 | | 
|  | 25 | +| NVIDIA GeForce RTX 3090       | 2048x2048    |            0.742 |     47.3033  |               0.954 |            3.52782 | | 
|  | 26 | +| NVIDIA GeForce RTX 3090       | 1024x1024    |            0.136 |     12.2965  |               0.207 |            3.52782 | | 
|  | 27 | +| NVIDIA GeForce RTX 3080       | 512x512      |            0.036 |      8.51761 |               0.036 |            8.51761 | | 
|  | 28 | +| NVIDIA GeForce RTX 3080       | 256x256      |            0.01  |      3.18387 |               0.01  |            3.18387 | | 
|  | 29 | +| NVIDIA GeForce RTX 3080       | 2048x2048    |            0.863 |     86.7424  |               1.191 |            8.51761 | | 
|  | 30 | +| NVIDIA GeForce RTX 3080       | 1024x1024    |            0.157 |     29.6888  |               0.227 |            8.51761 | | 
|  | 31 | +| NVIDIA GeForce RTX 3070       | 512x512      |            0.051 |     10.6941  |               0.051 |           10.6941  | | 
|  | 32 | +| NVIDIA GeForce RTX 3070       | 256x256      |            0.015 | | 
|  | 33 | +|      3.99743 |               0.015 |            3.99743 | | 
|  | 34 | +| NVIDIA GeForce RTX 3070       | 2048x2048    |            1.217 |     96.054   |               1.482 |           10.6941  | | 
|  | 35 | +| NVIDIA GeForce RTX 3070       | 1024x1024    |            0.223 |     37.2751  |               0.327 |           10.6941  | | 
|  | 36 | + | 
|  | 37 | +</details> | 
|  | 38 | + | 
|  | 39 | +<details><summary>SDXL</summary> | 
|  | 40 | + | 
|  | 41 | +| GPU                           | Resolution   |   Time (seconds) |   Memory Consumed (%) |   Tiled Time (seconds) |   Tiled Memory (%) | | 
|  | 42 | +|:------------------------------|:-------------|-----------------:|----------------------:|-----------------------:|-------------------:| | 
|  | 43 | +| NVIDIA GeForce RTX 4090       | 512x512      |            0.029 |               4.95707 |                  0.029 |            4.95707 | | 
|  | 44 | +| NVIDIA GeForce RTX 4090       | 256x256      |            0.007 |               2.29666 |                  0.007 |            2.29666 | | 
|  | 45 | +| NVIDIA GeForce RTX 4090       | 2048x2048    |            0.873 |              66.3452  |                  0.863 |           15.5649  | | 
|  | 46 | +| NVIDIA GeForce RTX 4090       | 1024x1024    |            0.142 |              15.5479  |                  0.143 |           15.5479  | | 
|  | 47 | +| NVIDIA GeForce RTX 4080 SUPER | 512x512      |            0.044 |               7.46735 |                  0.044 |            7.46735 | | 
|  | 48 | +| NVIDIA GeForce RTX 4080 SUPER | 256x256      |            0.01  |               3.4597  |                  0.01  |            3.4597  | | 
|  | 49 | +| NVIDIA GeForce RTX 4080 SUPER | 2048x2048    |            1.317 |              87.1615  |                  1.291 |           23.447   | | 
|  | 50 | +| NVIDIA GeForce RTX 4080 SUPER | 1024x1024    |            0.213 |              23.4215  |                  0.214 |           23.4215  | | 
|  | 51 | +| NVIDIA GeForce RTX 3090       | 512x512      |            0.058 |               5.65638 |                  0.058 |            5.65638 | | 
|  | 52 | +| NVIDIA GeForce RTX 3090       | 256x256      |            0.016 |               2.45081 |                  0.016 |            2.45081 | | 
|  | 53 | +| NVIDIA GeForce RTX 3090       | 2048x2048    |            1.755 |              77.8239  |                  1.614 |           18.4193  | | 
|  | 54 | +| NVIDIA GeForce RTX 3090       | 1024x1024    |            0.265 |              18.4023  |                  0.265 |           18.4023  | | 
|  | 55 | +| NVIDIA GeForce RTX 3080       | 512x512      |            0.064 |              13.6568  |                  0.064 |           13.6568  | | 
|  | 56 | +| NVIDIA GeForce RTX 3080       | 256x256      |            0.018 |               5.91728 |                  0.018 |            5.91728 | | 
|  | 57 | +| NVIDIA GeForce RTX 3080       | 2048x2048    |          内存不足 (OOM) |             内存不足 (OOM) |                  1.866 |           44.4717  | | 
|  | 58 | +| NVIDIA GeForce RTX 3080       | 1024x1024    |            0.302 |              44.4308  |                  0.302 |           44.4308  | | 
|  | 59 | +| NVIDIA GeForce RTX 3070       | 512x512      |            0.093 |              17.1465  |                  0.093 |           17.1465  | | 
|  | 60 | +| NVIDIA GeForce R | 
|  | 61 | +| NVIDIA GeForce RTX 3070       | 256x256      |            0.025 |               7.42931 |                  0.026 |            7.42931 | | 
|  | 62 | +| NVIDIA GeForce RTX 3070       | 2048x2048    |          OOM     |             OOM       |                  2.674 |           55.8355  | | 
|  | 63 | +| NVIDIA GeForce RTX 3070       | 1024x1024    |            0.443 |              55.7841  |                  0.443 |           55.7841  | | 
|  | 64 | + | 
|  | 65 | +</details> | 
|  | 66 | + | 
|  | 67 | +## 可用 VAE | 
|  | 68 | + | 
|  | 69 | +|   | **端点** | **模型** | | 
|  | 70 | +|:-:|:-----------:|:--------:| | 
|  | 71 | +| **Stable Diffusion v1** | [https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud](https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud) | [`stabilityai/sd-vae-ft-mse`](https://hf.co/stabilityai/sd-vae-ft-mse) | | 
|  | 72 | +| **Stable Diffusion XL** | [https://xjqqhmyn62rog84g.us-east-1.aws.endpoints.huggingface.cloud](https://xjqqhmyn62rog84g.us-east-1.aws.endpoints.huggingface.cloud) | [`madebyollin/sdxl-vae-fp16-fix`](https://hf.co/madebyollin/sdxl-vae-fp16-fix) | | 
|  | 73 | +| **Flux** | [https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud](https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud) | [`black-forest-labs/FLUX.1-schnell`](https://hf.co/black-forest-labs/FLUX.1-schnell) | | 
|  | 74 | + | 
|  | 75 | + | 
|  | 76 | +> [!TIP] | 
|  | 77 | +> 模型支持可以在此处请求:[这里](https://github.com/huggingface/diffusers/issues/new?template=remote-vae-pilot-feedback.yml)。 | 
|  | 78 | +
 | 
|  | 79 | + | 
|  | 80 | +## 代码 | 
|  | 81 | + | 
|  | 82 | +> [!TIP] | 
|  | 83 | +> 从 `main` 安装 `diffusers` 以运行代码:`pip install git+https://github.com/huggingface/diffusers@main` | 
|  | 84 | +
 | 
|  | 85 | + | 
|  | 86 | +一个辅助方法简化了与混合推理的交互。 | 
|  | 87 | + | 
|  | 88 | +```python | 
|  | 89 | +from diffusers.utils.remote_utils import remote_encode | 
|  | 90 | +``` | 
|  | 91 | + | 
|  | 92 | +### 基本示例 | 
|  | 93 | + | 
|  | 94 | +让我们编码一张图像,然后解码以演示。 | 
|  | 95 | + | 
|  | 96 | +<figure class="image flex flex-col items-center justify-center text-center m-0 w-full"> | 
|  | 97 | +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"/> | 
|  | 98 | +</figure> | 
|  | 99 | + | 
|  | 100 | +<details><summary>代码</summary> | 
|  | 101 | + | 
|  | 102 | +```python | 
|  | 103 | +from diffusers.utils import load_image | 
|  | 104 | +from diffusers.utils.remote_utils import remote_decode | 
|  | 105 | + | 
|  | 106 | +image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg?download=true") | 
|  | 107 | + | 
|  | 108 | +latent = remote_encode( | 
|  | 109 | +    endpoint="https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud/", | 
|  | 110 | +    scaling_factor=0.3611, | 
|  | 111 | +    shift_factor=0.1159, | 
|  | 112 | +) | 
|  | 113 | + | 
|  | 114 | +decoded = remote_decode( | 
|  | 115 | +    endpoint="https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud/", | 
|  | 116 | +    tensor=latent, | 
|  | 117 | +    scaling_factor=0.3611, | 
|  | 118 | +    shift_factor=0.1159, | 
|  | 119 | +) | 
|  | 120 | +``` | 
|  | 121 | + | 
|  | 122 | +</details> | 
|  | 123 | + | 
|  | 124 | +<figure class="image flex flex-col items-center justify-center text-center m-0 w-full"> | 
|  | 125 | +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/remote_vae/decoded.png"/> | 
|  | 126 | +</figure> | 
|  | 127 | + | 
|  | 128 | + | 
|  | 129 | +### 生成 | 
|  | 130 | + | 
|  | 131 | +现在让我们看一个生成示例,我们将编码图像,生成,然后远程解码! | 
|  | 132 | + | 
|  | 133 | +<details><summary>代码</summary> | 
|  | 134 | + | 
|  | 135 | +```python | 
|  | 136 | +import torch | 
|  | 137 | +from diffusers import StableDiffusionImg2ImgPip | 
|  | 138 | +from diffusers.utils import load_image | 
|  | 139 | +from diffusers.utils.remote_utils import remote_decode, remote_encode | 
|  | 140 | + | 
|  | 141 | +pipe = StableDiffusionImg2ImgPipeline.from_pretrained( | 
|  | 142 | +    "stable-diffusion-v1-5/stable-diffusion-v1-5", | 
|  | 143 | +    torch_dtype=torch.float16, | 
|  | 144 | +    variant="fp16", | 
|  | 145 | +    vae=None, | 
|  | 146 | +).to("cuda") | 
|  | 147 | + | 
|  | 148 | +init_image = load_image( | 
|  | 149 | +    "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" | 
|  | 150 | +) | 
|  | 151 | +init_image = init_image.resize((768, 512)) | 
|  | 152 | + | 
|  | 153 | +init_latent = remote_encode( | 
|  | 154 | +    endpoint="https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud/", | 
|  | 155 | +    image=init_image, | 
|  | 156 | +    scaling_factor=0.18215, | 
|  | 157 | +) | 
|  | 158 | + | 
|  | 159 | +prompt = "A fantasy landscape, trending on artstation" | 
|  | 160 | +latent = pipe( | 
|  | 161 | +    prompt=prompt, | 
|  | 162 | +    image=init_latent, | 
|  | 163 | +    strength=0.75, | 
|  | 164 | +    output_type="latent", | 
|  | 165 | +).images | 
|  | 166 | + | 
|  | 167 | +image = remote_decode( | 
|  | 168 | +    endpoint="https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud/", | 
|  | 169 | +    tensor=latent, | 
|  | 170 | +    scaling_factor=0.18215, | 
|  | 171 | +) | 
|  | 172 | +image.save("fantasy_landscape.jpg") | 
|  | 173 | +``` | 
|  | 174 | + | 
|  | 175 | +</details> | 
|  | 176 | + | 
|  | 177 | +<figure class="image flex flex-col items-center justify-center text-center m-0 w-full"> | 
|  | 178 | +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/remote_vae/fantasy_landscape.png"/> | 
|  | 179 | +</figure> | 
|  | 180 | + | 
|  | 181 | +## 集成 | 
|  | 182 | + | 
|  | 183 | +* **[SD.Next](https://github.com/vladmandic/sdnext):** 具有直接支持混合推理功能的一体化用户界面。 | 
|  | 184 | +* **[ComfyUI-HFRemoteVae](https://github.com/kijai/ComfyUI-HFRemoteVae):** 用于混合推理的 ComfyUI 节点。 | 
0 commit comments