Skip to content

Commit 18d3f60

Browse files
committed
Merge branch 'main' into Add-AnyText
2 parents be4a319 + cee7c1b commit 18d3f60

File tree

7 files changed

+1284
-73
lines changed

7 files changed

+1284
-73
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
202202

203203
- https://github.com/microsoft/TaskMatrix
204204
- https://github.com/invoke-ai/InvokeAI
205+
- https://github.com/InstantID/InstantID
205206
- https://github.com/apple/ml-stable-diffusion
206207
- https://github.com/Sanster/lama-cleaner
207208
- https://github.com/IDEA-Research/Grounded-Segment-Anything

docs/source/en/_toctree.yml

Lines changed: 70 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -223,68 +223,76 @@
223223
sections:
224224
- local: api/models/overview
225225
title: Overview
226-
- local: api/models/unet
227-
title: UNet1DModel
228-
- local: api/models/unet2d
229-
title: UNet2DModel
230-
- local: api/models/unet2d-cond
231-
title: UNet2DConditionModel
232-
- local: api/models/unet3d-cond
233-
title: UNet3DConditionModel
234-
- local: api/models/unet-motion
235-
title: UNetMotionModel
236-
- local: api/models/uvit2d
237-
title: UViT2DModel
238-
- local: api/models/vq
239-
title: VQModel
240-
- local: api/models/autoencoderkl
241-
title: AutoencoderKL
242-
- local: api/models/autoencoderkl_cogvideox
243-
title: AutoencoderKLCogVideoX
244-
- local: api/models/asymmetricautoencoderkl
245-
title: AsymmetricAutoencoderKL
246-
- local: api/models/stable_cascade_unet
247-
title: StableCascadeUNet
248-
- local: api/models/autoencoder_tiny
249-
title: Tiny AutoEncoder
250-
- local: api/models/autoencoder_oobleck
251-
title: Oobleck AutoEncoder
252-
- local: api/models/consistency_decoder_vae
253-
title: ConsistencyDecoderVAE
254-
- local: api/models/transformer2d
255-
title: Transformer2DModel
256-
- local: api/models/pixart_transformer2d
257-
title: PixArtTransformer2DModel
258-
- local: api/models/dit_transformer2d
259-
title: DiTTransformer2DModel
260-
- local: api/models/hunyuan_transformer2d
261-
title: HunyuanDiT2DModel
262-
- local: api/models/aura_flow_transformer2d
263-
title: AuraFlowTransformer2DModel
264-
- local: api/models/flux_transformer
265-
title: FluxTransformer2DModel
266-
- local: api/models/latte_transformer3d
267-
title: LatteTransformer3DModel
268-
- local: api/models/cogvideox_transformer3d
269-
title: CogVideoXTransformer3DModel
270-
- local: api/models/lumina_nextdit2d
271-
title: LuminaNextDiT2DModel
272-
- local: api/models/transformer_temporal
273-
title: TransformerTemporalModel
274-
- local: api/models/sd3_transformer2d
275-
title: SD3Transformer2DModel
276-
- local: api/models/stable_audio_transformer
277-
title: StableAudioDiTModel
278-
- local: api/models/prior_transformer
279-
title: PriorTransformer
280-
- local: api/models/controlnet
281-
title: ControlNetModel
282-
- local: api/models/controlnet_hunyuandit
283-
title: HunyuanDiT2DControlNetModel
284-
- local: api/models/controlnet_sd3
285-
title: SD3ControlNetModel
286-
- local: api/models/controlnet_sparsectrl
287-
title: SparseControlNetModel
226+
- sections:
227+
- local: api/models/controlnet
228+
title: ControlNetModel
229+
- local: api/models/controlnet_hunyuandit
230+
title: HunyuanDiT2DControlNetModel
231+
- local: api/models/controlnet_sd3
232+
title: SD3ControlNetModel
233+
- local: api/models/controlnet_sparsectrl
234+
title: SparseControlNetModel
235+
title: ControlNets
236+
- sections:
237+
- local: api/models/aura_flow_transformer2d
238+
title: AuraFlowTransformer2DModel
239+
- local: api/models/cogvideox_transformer3d
240+
title: CogVideoXTransformer3DModel
241+
- local: api/models/dit_transformer2d
242+
title: DiTTransformer2DModel
243+
- local: api/models/flux_transformer
244+
title: FluxTransformer2DModel
245+
- local: api/models/hunyuan_transformer2d
246+
title: HunyuanDiT2DModel
247+
- local: api/models/latte_transformer3d
248+
title: LatteTransformer3DModel
249+
- local: api/models/lumina_nextdit2d
250+
title: LuminaNextDiT2DModel
251+
- local: api/models/pixart_transformer2d
252+
title: PixArtTransformer2DModel
253+
- local: api/models/prior_transformer
254+
title: PriorTransformer
255+
- local: api/models/sd3_transformer2d
256+
title: SD3Transformer2DModel
257+
- local: api/models/stable_audio_transformer
258+
title: StableAudioDiTModel
259+
- local: api/models/transformer2d
260+
title: Transformer2DModel
261+
- local: api/models/transformer_temporal
262+
title: TransformerTemporalModel
263+
title: Transformers
264+
- sections:
265+
- local: api/models/stable_cascade_unet
266+
title: StableCascadeUNet
267+
- local: api/models/unet
268+
title: UNet1DModel
269+
- local: api/models/unet2d
270+
title: UNet2DModel
271+
- local: api/models/unet2d-cond
272+
title: UNet2DConditionModel
273+
- local: api/models/unet3d-cond
274+
title: UNet3DConditionModel
275+
- local: api/models/unet-motion
276+
title: UNetMotionModel
277+
- local: api/models/uvit2d
278+
title: UViT2DModel
279+
title: UNets
280+
- sections:
281+
- local: api/models/autoencoderkl
282+
title: AutoencoderKL
283+
- local: api/models/autoencoderkl_cogvideox
284+
title: AutoencoderKLCogVideoX
285+
- local: api/models/asymmetricautoencoderkl
286+
title: AsymmetricAutoencoderKL
287+
- local: api/models/consistency_decoder_vae
288+
title: ConsistencyDecoderVAE
289+
- local: api/models/autoencoder_oobleck
290+
title: Oobleck AutoEncoder
291+
- local: api/models/autoencoder_tiny
292+
title: Tiny AutoEncoder
293+
- local: api/models/vq
294+
title: VQModel
295+
title: VAEs
288296
title: Models
289297
- isExpanded: false
290298
sections:

docs/source/en/api/pipelines/stable_audio.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Stable Audio is trained on a corpus of around 48k audio recordings, where around
2121
The abstract of the paper is the following:
2222
*Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.*
2323

24-
This pipeline was contributed by [Yoach Lacombe](https://huggingface.co/ylacombe). The original codebase can be found at [Stability-AI/stable-audio-tool](https://github.com/Stability-AI/stable-audio-tool).
24+
This pipeline was contributed by [Yoach Lacombe](https://huggingface.co/ylacombe). The original codebase can be found at [Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools).
2525

2626
## Tips
2727

docs/source/en/optimization/fp16.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,3 +125,5 @@ image
125125
<figcaption class="mt-2 text-center text-sm text-gray-500">distilled Stable Diffusion + Tiny AutoEncoder</figcaption>
126126
</div>
127127
</div>
128+
129+
More tiny autoencoder models for other Stable Diffusion models, like Stable Diffusion 3, are available from [madebyollin](https://huggingface.co/madebyollin).

examples/community/README.md

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
7171
| Stable Diffusion BoxDiff Pipeline | Training-free controlled generation with bounding boxes using [BoxDiff](https://github.com/showlab/BoxDiff) | [Stable Diffusion BoxDiff Pipeline](#stable-diffusion-boxdiff) | - | [Jingyang Zhang](https://github.com/zjysteven/) |
7272
| FRESCO V2V Pipeline | Implementation of [[CVPR 2024] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation](https://arxiv.org/abs/2403.12962) | [FRESCO V2V Pipeline](#fresco) | - | [Yifan Zhou](https://github.com/SingleZombie) |
7373
| AnimateDiff IPEX Pipeline | Accelerate AnimateDiff inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [AnimateDiff on IPEX](#animatediff-on-ipex) | - | [Dan Li](https://github.com/ustcuna/) |
74+
| HunyuanDiT Differential Diffusion Pipeline | Applies [Differential Diffsuion](https://github.com/exx8/differential-diffusion) to [HunyuanDiT](https://github.com/huggingface/diffusers/pull/8240). | [HunyuanDiT with Differential Diffusion](#hunyuandit-with-differential-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing) | [Monjoy Choudhury](https://github.com/MnCSSJ4x) |
7475

7576
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
7677

@@ -1646,7 +1647,6 @@ from diffusers import DiffusionPipeline
16461647
scheduler = DDIMScheduler.from_pretrained("stabilityai/stable-diffusion-2-1",
16471648
subfolder="scheduler")
16481649

1649-
16501650
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1",
16511651
custom_pipeline="stable_diffusion_tensorrt_img2img",
16521652
variant='fp16',
@@ -1661,7 +1661,6 @@ pipe = pipe.to("cuda")
16611661
url = "https://pajoca.com/wp-content/uploads/2022/09/tekito-yamakawa-1.png"
16621662
response = requests.get(url)
16631663
input_image = Image.open(BytesIO(response.content)).convert("RGB")
1664-
16651664
prompt = "photorealistic new zealand hills"
16661665
image = pipe(prompt, image=input_image, strength=0.75,).images[0]
16671666
image.save('tensorrt_img2img_new_zealand_hills.png')
@@ -4209,6 +4208,52 @@ print("Latency of AnimateDiffPipelineIpex--fp32", latency, "s for total", step,
42094208
latency = elapsed_time(pipe4, num_inference_steps=step)
42104209
print("Latency of AnimateDiffPipeline--fp32",latency, "s for total", step, "steps")
42114210
```
4211+
### HunyuanDiT with Differential Diffusion
4212+
4213+
#### Usage
4214+
4215+
```python
4216+
import torch
4217+
from diffusers import FlowMatchEulerDiscreteScheduler
4218+
from diffusers.utils import load_image
4219+
from PIL import Image
4220+
from torchvision import transforms
4221+
4222+
from pipeline_hunyuandit_differential_img2img import (
4223+
HunyuanDiTDifferentialImg2ImgPipeline,
4224+
)
4225+
4226+
4227+
pipe = HunyuanDiTDifferentialImg2ImgPipeline.from_pretrained(
4228+
"Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
4229+
).to("cuda")
4230+
4231+
4232+
source_image = load_image(
4233+
"https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png"
4234+
)
4235+
map = load_image(
4236+
"https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask_2.png"
4237+
)
4238+
prompt = "a green pear"
4239+
negative_prompt = "blurry"
4240+
4241+
image = pipe(
4242+
prompt=prompt,
4243+
negative_prompt=negative_prompt,
4244+
image=source_image,
4245+
num_inference_steps=28,
4246+
guidance_scale=4.5,
4247+
strength=1.0,
4248+
map=map,
4249+
).images[0]
4250+
```
4251+
4252+
| ![Gradient](https://github.com/user-attachments/assets/e38ce4d5-1ae6-4df0-ab43-adc1b45716b5) | ![Input](https://github.com/user-attachments/assets/9c95679c-e9d7-4f5a-90d6-560203acd6b3) | ![Output](https://github.com/user-attachments/assets/5313ff64-a0c4-418b-8b55-a38f1a5e7532) |
4253+
| ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
4254+
| Gradient | Input | Output |
4255+
4256+
A colab notebook demonstrating all results can be found [here](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing). Depth Maps have also been added in the same colab.
42124257

42134258
# Perturbed-Attention Guidance
42144259

@@ -4285,4 +4330,4 @@ grid_image.save(grid_dir + "sample.png")
42854330

42864331
`pag_scale` : guidance scale of PAG (ex: 5.0)
42874332

4288-
`pag_applied_layers_index` : index of the layer to apply perturbation (ex: ['m0'])
4333+
`pag_applied_layers_index` : index of the layer to apply perturbation (ex: ['m0'])

0 commit comments

Comments
 (0)