huggingface
diff --git a/‎.github/workflows/benchmark.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/benchmark.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/build_docker_images.yml‎
Lines changed: 10 additions & 3 deletions b/‎.github/workflows/build_docker_images.yml‎
Lines changed: 10 additions & 3 deletions
diff --git a/‎.github/workflows/nightly_tests.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/nightly_tests.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/pr_tests.yml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/pr_tests.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.github/workflows/push_tests.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/push_tests.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/release_tests_fast.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/release_tests_fast.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docker/diffusers-pytorch-compile-cuda/Dockerfile‎
Lines changed: 0 additions & 50 deletions b/‎docker/diffusers-pytorch-compile-cuda/Dockerfile‎
Lines changed: 0 additions & 50 deletions
diff --git a/‎docs/source/en/_toctree.yml‎
Lines changed: 3 additions & 7 deletions b/‎docs/source/en/_toctree.yml‎
Lines changed: 3 additions & 7 deletions
diff --git a/‎docs/source/en/api/cache.md‎
Lines changed: 4 additions & 56 deletions b/‎docs/source/en/api/cache.md‎
Lines changed: 4 additions & 56 deletions
diff --git a/‎docs/source/en/api/loaders/lora.md‎
Lines changed: 5 additions & 1 deletion b/‎docs/source/en/api/loaders/lora.md‎
Lines changed: 5 additions & 1 deletion
@@ -23,7 +23,7 @@ jobs:
     runs-on:
       group: aws-g6-4xlarge-plus
     container:
-      image: diffusers/diffusers-pytorch-compile-cuda
+      image: diffusers/diffusers-pytorch-cuda
       options: --shm-size "16gb" --ipc host --gpus 0
     steps:
       - name: Checkout diffusers
 
@@ -38,9 +38,16 @@ jobs:
           token: ${{ secrets.GITHUB_TOKEN }}
 
       - name: Build Changed Docker Images
+        env: 
+          CHANGED_FILES: ${{ steps.file_changes.outputs.all }}
         run: |
-          CHANGED_FILES="${{ steps.file_changes.outputs.all }}"
-          for FILE in $CHANGED_FILES; do
+          echo "$CHANGED_FILES"
+          for FILE in $CHANGED_FILES; do 
+            # skip anything that isn't still on disk
+            if [[ ! -f "$FILE" ]]; then
+              echo "Skipping removed file $FILE"
+              continue
+            fi           
             if [[ "$FILE" == docker/*Dockerfile ]]; then
               DOCKER_PATH="${FILE%/Dockerfile}"
               DOCKER_TAG=$(basename "$DOCKER_PATH")
@@ -65,7 +72,7 @@ jobs:
         image-name:
           - diffusers-pytorch-cpu
           - diffusers-pytorch-cuda
-          - diffusers-pytorch-compile-cuda
+          - diffusers-pytorch-cuda
           - diffusers-pytorch-xformers-cuda
           - diffusers-pytorch-minimum-cuda
           - diffusers-flax-cpu
 
@@ -188,7 +188,7 @@ jobs:
       group: aws-g4dn-2xlarge
 
     container:
-      image: diffusers/diffusers-pytorch-compile-cuda
+      image: diffusers/diffusers-pytorch-cuda
       options: --gpus 0 --shm-size "16gb" --ipc host
 
     steps:
 
@@ -291,8 +291,8 @@ jobs:
     - name: Failure short reports
       if: ${{ failure() }}
       run: |
-        cat reports/tests_lora_failures_short.txt
-        cat reports/tests_models_lora_failures_short.txt
+        cat reports/tests_peft_main_failures_short.txt
+        cat reports/tests_models_lora_peft_main_failures_short.txt
 
     - name: Test suite reports artifacts
       if: ${{ always() }}
 
@@ -262,7 +262,7 @@ jobs:
       group: aws-g4dn-2xlarge
 
     container:
-      image: diffusers/diffusers-pytorch-compile-cuda
+      image: diffusers/diffusers-pytorch-cuda
       options: --gpus 0 --shm-size "16gb" --ipc host
 
     steps:
 
@@ -316,7 +316,7 @@ jobs:
       group: aws-g4dn-2xlarge
 
     container:
-      image: diffusers/diffusers-pytorch-compile-cuda
+      image: diffusers/diffusers-pytorch-cuda
       options: --gpus 0 --shm-size "16gb" --ipc host
 
     steps:
 
@@ -17,8 +17,6 @@
     title: AutoPipeline
   - local: tutorials/basic_training
     title: Train a diffusion model
-  - local: tutorials/fast_diffusion
-    title: Accelerate inference of text-to-image diffusion models
   title: Tutorials
 - sections:
   - local: using-diffusers/loading
@@ -94,8 +92,6 @@
     title: API Reference
   title: Hybrid Inference
 - sections:
-  - local: using-diffusers/cogvideox
-    title: CogVideoX
   - local: using-diffusers/consisid
     title: ConsisID
   - local: using-diffusers/sdxl
@@ -180,10 +176,10 @@
 - sections:
   - local: optimization/fp16
     title: Accelerate inference
+  - local: optimization/cache
+    title: Caching
   - local: optimization/memory
     title: Reduce memory usage
-  - local: optimization/torch2.0
-    title: PyTorch 2.0
   - local: optimization/xformers
     title: xFormers
   - local: optimization/tome
@@ -210,7 +206,7 @@
     - local: optimization/mps
       title: Metal Performance Shaders (MPS)
     - local: optimization/habana
-      title: Habana Gaudi
+      title: Intel Gaudi
     - local: optimization/neuron
       title: AWS Neuron
     title: Optimized hardware
 
@@ -11,71 +11,19 @@ specific language governing permissions and limitations under the License. -->
 
 # Caching methods
 
-## Pyramid Attention Broadcast
+Cache methods speedup diffusion transformers by storing and reusing intermediate outputs of specific layers, such as attention and feedforward layers, instead of recalculating them at each inference step.
 
-[Pyramid Attention Broadcast](https://huggingface.co/papers/2408.12588) from Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You.
-
-Pyramid Attention Broadcast (PAB) is a method that speeds up inference in diffusion models by systematically skipping attention computations between successive inference steps and reusing cached attention states. The attention states are not very different between successive inference steps. The most prominent difference is in the spatial attention blocks, not as much in the temporal attention blocks, and finally the least in the cross attention blocks. Therefore, many cross attention computation blocks can be skipped, followed by the temporal and spatial attention blocks. By combining other techniques like sequence parallelism and classifier-free guidance parallelism, PAB achieves near real-time video generation.
-
-Enable PAB with [`~PyramidAttentionBroadcastConfig`] on any pipeline. For some benchmarks, refer to [this](https://github.com/huggingface/diffusers/pull/9562) pull request.
-
-```python
-import torch
-from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig
-
-pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
-pipe.to("cuda")
-
-# Increasing the value of `spatial_attention_timestep_skip_range[0]` or decreasing the value of
-# `spatial_attention_timestep_skip_range[1]` will decrease the interval in which pyramid attention
-# broadcast is active, leader to slower inference speeds. However, large intervals can lead to
-# poorer quality of generated videos.
-config = PyramidAttentionBroadcastConfig(
-    spatial_attention_block_skip_range=2,
-    spatial_attention_timestep_skip_range=(100, 800),
-    current_timestep_callback=lambda: pipe.current_timestep,
-)
-pipe.transformer.enable_cache(config)
-```
-
-## Faster Cache
-
-[FasterCache](https://huggingface.co/papers/2410.19355) from Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong.
-
-FasterCache is a method that speeds up inference in diffusion transformers by:
-- Reusing attention states between successive inference steps, due to high similarity between them
-- Skipping unconditional branch prediction used in classifier-free guidance by revealing redundancies between unconditional and conditional branch outputs for the same timestep, and therefore approximating the unconditional branch output using the conditional branch output
-
-```python
-import torch
-from diffusers import CogVideoXPipeline, FasterCacheConfig
-
-pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
-pipe.to("cuda")
-
-config = FasterCacheConfig(
-    spatial_attention_block_skip_range=2,
-    spatial_attention_timestep_skip_range=(-1, 681),
-    current_timestep_callback=lambda: pipe.current_timestep,
-    attention_weight_callback=lambda _: 0.3,
-    unconditional_batch_skip_range=5,
-    unconditional_batch_timestep_skip_range=(-1, 781),
-    tensor_format="BFCHW",
-)
-pipe.transformer.enable_cache(config)
-```
-
-### CacheMixin
+## CacheMixin
 
 [[autodoc]] CacheMixin
 
-### PyramidAttentionBroadcastConfig
+## PyramidAttentionBroadcastConfig
 
 [[autodoc]] PyramidAttentionBroadcastConfig
 
 [[autodoc]] apply_pyramid_attention_broadcast
 
-### FasterCacheConfig
+## FasterCacheConfig
 
 [[autodoc]] FasterCacheConfig
 
 
@@ -98,4 +98,8 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
 
 ## LoraBaseMixin
 
-[[autodoc]] loaders.lora_base.LoraBaseMixin
+[[autodoc]] loaders.lora_base.LoraBaseMixin
+
+## WanLoraLoaderMixin
+
+[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin