huggingface
diff --git a/‎.github/workflows/nightly_tests.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/nightly_tests.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/en/_toctree.yml‎
Lines changed: 2 additions & 0 deletions b/‎docs/source/en/_toctree.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/source/en/api/models/auto_model.md‎
Lines changed: 29 additions & 0 deletions b/‎docs/source/en/api/models/auto_model.md‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎docs/source/en/api/pipelines/sana_sprint.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/en/api/pipelines/sana_sprint.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/en/optimization/memory.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/source/en/optimization/memory.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/source/en/using-diffusers/loading.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/en/using-diffusers/loading.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/en/using-diffusers/loading_adapters.md‎
Lines changed: 53 additions & 0 deletions b/‎docs/source/en/using-diffusers/loading_adapters.md‎
Lines changed: 53 additions & 0 deletions
diff --git a/‎examples/advanced_diffusion_training/requirements.txt‎
Lines changed: 4 additions & 3 deletions b/‎examples/advanced_diffusion_training/requirements.txt‎
Lines changed: 4 additions & 3 deletions
@@ -417,7 +417,7 @@ jobs:
             additional_deps: ["peft"]
           - backend: "gguf"
             test_location: "gguf"
-            additional_deps: []
+            additional_deps: ["peft"]
           - backend: "torchao"
             test_location: "torchao"
             additional_deps: []
 
@@ -265,6 +265,8 @@
     sections:
     - local: api/models/overview
       title: Overview
+    - local: api/models/auto_model
+      title: AutoModel
     - sections:
       - local: api/models/controlnet
         title: ControlNetModel
 
@@ -0,0 +1,29 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# AutoModel
+
+The `AutoModel` is designed to make it easy to load a checkpoint without needing to know the specific model class. `AutoModel` automatically retrieves the correct model class from the checkpoint `config.json` file.
+
+```python
+from diffusers import AutoModel, AutoPipelineForText2Image
+
+unet = AutoModel.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet")
+pipe = AutoPipelineForText2Image.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", unet=unet)
+```
+
+
+## AutoModel
+
+[[autodoc]] AutoModel
+	- all
+	- from_pretrained
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License. -->
 
-# SanaSprintPipeline
+# SANA-Sprint
 
 <div class="flex flex-wrap space-x-1">
   <img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
 
@@ -178,6 +178,9 @@ pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch
 # We can utilize the enable_group_offload method for Diffusers model implementations
 pipe.transformer.enable_group_offload(onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True)
 
+# Uncomment the following to also allow recording the current streams.
+# pipe.transformer.enable_group_offload(onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True, record_stream=True)
+
 # For any other model implementations, the apply_group_offloading function can be used
 apply_group_offloading(pipe.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2)
 apply_group_offloading(pipe.vae, onload_device=onload_device, offload_type="leaf_level")
@@ -205,6 +208,7 @@ Group offloading (for CUDA devices with support for asynchronous data transfer s
 - The `use_stream` parameter can be used with CUDA devices to enable prefetching layers for onload. It defaults to `False`. Layer prefetching allows overlapping computation and data transfer of model weights, which drastically reduces the overall execution time compared to other offloading methods. However, it can increase the CPU RAM usage significantly. Ensure that available CPU RAM that is at least twice the size of the model when setting `use_stream=True`. You can find more information about CUDA streams [here](https://pytorch.org/docs/stable/generated/torch.cuda.Stream.html)
 - If specifying `use_stream=True` on VAEs with tiling enabled, make sure to do a dummy forward pass (possibly with dummy inputs) before the actual inference to avoid device-mismatch errors. This may not work on all implementations. Please open an issue if you encounter any problems.
 - The parameter `low_cpu_mem_usage` can be set to `True` to reduce CPU memory usage when using streams for group offloading. This is useful when the CPU memory is the bottleneck, but it may counteract the benefits of using streams and increase the overall execution time. The CPU memory savings come from creating pinned-tensors on-the-fly instead of pre-pinning them. This parameter is better suited for using `leaf_level` offloading.
+- When using `use_stream=True`, users can additionally specify `record_stream=True` to get better speedups at the expense of slightly increased memory usage. Refer to the [official PyTorch docs](https://pytorch.org/docs/stable/generated/torch.Tensor.record_stream.html) to know more about this.
 
 For more information about available parameters and an explanation of how group offloading works, refer to [`~hooks.group_offloading.apply_group_offloading`].
 
 
@@ -105,7 +105,7 @@ import torch
 
 pipe = HunyuanVideoPipeline.from_pretrained(
     "hunyuanvideo-community/HunyuanVideo",
-    torch_dtype={'transformer': torch.bfloat16, 'default': torch.float16},
+    torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
 )
 print(pipe.transformer.dtype, pipe.vae.dtype)  # (torch.bfloat16, torch.float16)
 ```
 
@@ -194,6 +194,59 @@ Currently, [`~loaders.StableDiffusionLoraLoaderMixin.set_adapters`] only support
 
 </Tip>
 
+### Hotswapping LoRA adapters
+
+A common use case when serving multiple adapters is to load one adapter first, generate images, load another adapter, generate more images, load another adapter, etc. This workflow normally requires calling [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`], [`~loaders.StableDiffusionLoraLoaderMixin.set_adapters`], and possibly [`~loaders.peft.PeftAdapterMixin.delete_adapters`] to save memory. Moreover, if the model is compiled using `torch.compile`, performing these steps requires recompilation, which takes time.
+
+To better support this common workflow, you can "hotswap" a LoRA adapter, to avoid accumulating memory and in some cases, recompilation. It requires an adapter to already be loaded, and the new adapter weights are swapped in-place for the existing adapter.
+
+Pass `hotswap=True` when loading a LoRA adapter to enable this feature. It is important to indicate the name of the existing adapter, (`default_0` is the default adapter name), to be swapped. If you loaded the first adapter with a different name, use that name instead.
+
+```python
+pipe = ...
+# load adapter 1 as normal
+pipeline.load_lora_weights(file_name_adapter_1)
+# generate some images with adapter 1
+...
+# now hot swap the 2nd adapter
+pipeline.load_lora_weights(file_name_adapter_2, hotswap=True, adapter_name="default_0")
+# generate images with adapter 2
+```
+
+
+<Tip warning={true}>
+
+Hotswapping is not currently supported for LoRA adapters that target the text encoder.
+
+</Tip>
+
+For compiled models, it is often (though not always if the second adapter targets identical LoRA ranks and scales) necessary to call [`~loaders.lora_base.LoraBaseMixin.enable_lora_hotswap`] to avoid recompilation. Use [`~loaders.lora_base.LoraBaseMixin.enable_lora_hotswap`] _before_ loading the first adapter, and `torch.compile` should be called _after_ loading the first adapter.
+
+```python
+pipe = ...
+# call this extra method
+pipe.enable_lora_hotswap(target_rank=max_rank)
+# now load adapter 1
+pipe.load_lora_weights(file_name_adapter_1)
+# now compile the unet of the pipeline
+pipe.unet = torch.compile(pipeline.unet, ...)
+# generate some images with adapter 1
+...
+# now hot swap adapter 2
+pipeline.load_lora_weights(file_name_adapter_2, hotswap=True, adapter_name="default_0")
+# generate images with adapter 2
+```
+
+The `target_rank=max_rank` argument is important for setting the maximum rank among all LoRA adapters that will be loaded. If you have one adapter with rank 8 and another with rank 16, pass `target_rank=16`. You should use a higher value if in doubt. By default, this value is 128.
+
+However, there can be situations where recompilation is unavoidable. For example, if the hotswapped adapter targets more layers than the initial adapter, then recompilation is triggered. Try to load the adapter that targets the most layers first. Refer to the PEFT docs on [hotswapping](https://huggingface.co/docs/peft/main/en/package_reference/hotswap#peft.utils.hotswap.hotswap_adapter) for more details about the limitations of this feature.
+
+<Tip>
+
+Move your code inside the `with torch._dynamo.config.patch(error_on_recompile=True)` context manager to detect if a model was recompiled. If you detect recompilation despite following all the steps above, please open an issue with [Diffusers](https://github.com/huggingface/diffusers/issues) with a reproducible example.
+
+</Tip>
+
 ### Kohya and TheLastBen
 
 Other popular LoRA trainers from the community include those by [Kohya](https://github.com/kohya-ss/sd-scripts/) and [TheLastBen](https://github.com/TheLastBen/fast-stable-diffusion). These trainers create different LoRA checkpoints than those trained by 🤗 Diffusers, but they can still be loaded in the same way.
 
@@ -1,7 +1,8 @@
-accelerate>=0.16.0
+accelerate>=0.31.0
 torchvision
-transformers>=4.25.1
+transformers>=4.41.2
 ftfy
 tensorboard
 Jinja2
-peft==0.7.0
+peft>=0.11.1
+sentencepiece
Original file line number	Diff line number	Diff line change
`@@ -105,7 +105,7 @@ import torch`
`105`	`105`
`106`	`106`	`pipe = HunyuanVideoPipeline.from_pretrained(`
`107`	`107`	`"hunyuanvideo-community/HunyuanVideo",`
`108`		`- torch_dtype={'transformer': torch.bfloat16, 'default': torch.float16},`
	`108`	`+ torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},`
`109`	`109`	`)`
`110`	`110`	`print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16)`
`111`	`111`	```