You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -16,6 +16,8 @@ Therefore, we want to pilot an idea with the community — delegating the decodi
16
16
17
17
No data is stored or tracked, and code is open source. We made some changes to [huggingface-inference-toolkit](https://github.com/hlky/huggingface-inference-toolkit/tree/fix-text-support-binary) and use [custom handlers](https://huggingface.co/hlky/sd-vae-ft-mse/blob/main/handler.py).
18
18
19
+
This experimental feature is developed by [Diffusers 🧨](https://huggingface.co/docs/diffusers/hybrid_inference/overview)
20
+
19
21
**Table of contents**:
20
22
21
23
-[Getting started](#getting-started)
@@ -37,141 +39,14 @@ Below, we cover three use cases where we think this remote VAE inference would b
37
39
First, we have created a helper method for interacting with Remote VAEs.
38
40
39
41
> [!NOTE]
40
-
> We recommend installing`diffusers` from `main` to run the code.
There are 3 parts of decoding in a pipeline: `scaling` -> `decode` -> `postprocess`.
272
-
273
-
Options allow Remote VAE to be compatible with these different stages.
274
-
275
-
#### `processor`
276
-
277
-
With `output_type="pt"` the endpoint returns a `torch.Tensor` before `postprocess`. The final postprocessing and image creation is done locally.
278
-
279
-
With `output_type="pil"` on video models `processor=VideoProcessor()` is required for some local postprocessing.
280
-
281
-
#### `do_scaling`
282
-
283
-
-`do_scaling=False` allows Remote VAE to work as a drop-in replacement for `pipe.vae.decode`. Scaling should be applied to input before `remote_decode`.
284
-
-`do_scaling=True` scaling is applied by Remote VAE.
285
-
286
-
#### `output_type`
287
-
288
-
Image models support: `pil`, `pt`.
289
-
290
-
Video models support: `mp4`, `pil`, `pt`.
291
-
292
-
-`output_type="pil"` returns an image according to `image_format` for Image models and a tensor for Video models (equivalent to `postprocess_video(frames, output_type="pt")`) which has final postprocessing applied to create the frame images.
293
-
-`output_type="pt"` with `partial_postprocess=False` returns a `torch.Tensor` before `postprocess`. The final postprocessing and image creation is done locally.
294
-
-`output_type="pt"` with `partial_postprocess=True` returns a `torch.Tensor` with `postprocess` applied. The final image creation (`PIL.Image.fromarray`) is done locally. This reduces transfer compared to `partial_postprocess=False`.
295
-
-`output_type="mp4"` applies `postprocess_video(frames, output_type="pil")` then `export_to_video` and returns `bytes` of the `mp4`.
296
-
297
-
#### `input_tensor_type`/`output_tensor_type`
298
-
299
-
Choices `base64`, `binary`.
300
-
301
-
Using `binary` reduces transfer.
302
-
303
-
#### `image_format`
304
-
305
-
Choices `jpg`, `png`.
306
-
307
-
`jpg` is faster but lower quality.
308
-
309
-
#### `height`/`width`
310
-
311
-
Required for packed latents in Flux. Not required with `do_scaling=False` as `unpack` occurs before scaling.
0 commit comments