Skip to content

Commit c6845db

Browse files
committed
images
1 parent 1a03c6b commit c6845db

File tree

1 file changed

+164
-5
lines changed

1 file changed

+164
-5
lines changed

docs/source/en/using-diffusers/ip_adapter.md

Lines changed: 164 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
1212

1313
# IP-Adapter
1414

15-
[IP-Adapter](https://huggingface.co/papers/2308.06721) is a lightweight adapter designed to integrate image-based guidance into text-to-image diffusion models. The adapter uses an image encoder to extract image features that are passed to the newly added cross-attention layers in the UNet and fine-tuned. The original UNet model, and the existing cross-attention layers corresponding to text features, is frozen. Decoupling the cross-attention for image and text features enables more fine-grained and controllable generation.
15+
[IP-Adapter](https://huggingface.co/papers/2308.06721) is a lightweight adapter designed to integrate image-based guidance with text-to-image diffusion models. The adapter uses an image encoder to extract image features that are passed to the newly added cross-attention layers in the UNet and fine-tuned. The original UNet model and the existing cross-attention layers corresponding to text features is frozen. Decoupling the cross-attention for image and text features enables more fine-grained and controllable generation.
1616

1717
IP-Adapter files are typically ~100MBs because they only contain the image embeddings. This means you need to load a model first, and then load the IP-Adapter with [`~loaders.IPAdapterMixin.load_ip_adapter`].
1818

@@ -46,6 +46,17 @@ pipeline(
4646
).images[0]
4747
```
4848

49+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
50+
<figure>
51+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png" width="400" alt="IP-Adapter image"/>
52+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
53+
</figure>
54+
<figure>
55+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner_2.png" width="400" alt="generated image"/>
56+
<figcaption style="text-align: center;">generated image</figcaption>
57+
</figure>
58+
</div>
59+
4960
Take a look at the examples below to learn how to use IP-Adapter for other tasks.
5061

5162
<hfoptions id="usage">
@@ -77,6 +88,21 @@ pipeline(
7788
).images[0]
7889
```
7990

91+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
92+
<figure>
93+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_1.png" width="300" alt="input image"/>
94+
<figcaption style="text-align: center;">input image</figcaption>
95+
</figure>
96+
<figure>
97+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_gummy.png" width="300" alt="IP-Adapter image"/>
98+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
99+
</figure>
100+
<figure>
101+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_3.png" width="300" alt="generated image"/>
102+
<figcaption style="text-align: center;">generated image</figcaption>
103+
</figure>
104+
</div>
105+
80106
</hfoption>
81107
<hfoption id="inpainting">
82108

@@ -107,10 +133,25 @@ pipeline(
107133
).images[0]
108134
```
109135

136+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
137+
<figure>
138+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_1.png" width="300" alt="input image"/>
139+
<figcaption style="text-align: center;">input image</figcaption>
140+
</figure>
141+
<figure>
142+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_gummy.png" width="300" alt="IP-Adapter image"/>
143+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
144+
</figure>
145+
<figure>
146+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_inpaint.png" width="300" alt="generated image"/>
147+
<figcaption style="text-align: center;">generated image</figcaption>
148+
</figure>
149+
</div>
150+
110151
</hfoption>
111152
<hfoption id="video">
112153

113-
The [`~DiffusionPipeline.enable_model_cpu_offload`] method is useful for reducing memory, but you should enable it **after** the IP-Adapter is loaded. Otherwise, the IP-Adapter's image encoder is also offloaded to the CPU and returns an error.
154+
The [`~DiffusionPipeline.enable_model_cpu_offload`] method is useful for reducing memory and it should be enabled **after** the IP-Adapter is loaded. Otherwise, the IP-Adapter's image encoder is also offloaded to the CPU and returns an error.
114155

115156
```py
116157
import torch
@@ -151,6 +192,17 @@ pipeline(
151192
).frames[0]
152193
```
153194

195+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
196+
<figure>
197+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_inpaint.png" width="400" alt="IP-Adapter image"/>
198+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
199+
</figure>
200+
<figure>
201+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/gummy_bear.gif" width="400" alt="generated video"/>
202+
<figcaption style="text-align: center;">generated video</figcaption>
203+
</figure>
204+
</div>
205+
154206
</hfoption>
155207
</hfoptions>
156208

@@ -301,6 +353,17 @@ processor = IPAdapterMaskProcessor()
301353
masks = processor.preprocess([mask1, mask2], height=1024, width=1024)
302354
```
303355

356+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
357+
<figure>
358+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_mask1.png" width="200" alt="mask 1"/>
359+
<figcaption style="text-align: center;">mask 1</figcaption>
360+
</figure>
361+
<figure>
362+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_mask2.png" width="200" alt="mask 2"/>
363+
<figcaption style="text-align: center;">mask 2</figcaption>
364+
</figure>
365+
</div>
366+
304367
Provide both the IP-Adapter images and their scales as a list. Pass the preprocessed masks to `cross_attention_kwargs` in the pipeline.
305368

306369
```py
@@ -325,6 +388,29 @@ pipeline(
325388
).images[0]
326389
```
327390

391+
<div style="display: flex; flex-direction: column; gap: 10px;">
392+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
393+
<figure>
394+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_girl1.png" width="400" alt="IP-Adapter image 1"/>
395+
<figcaption style="text-align: center;">IP-Adapter image 1</figcaption>
396+
</figure>
397+
<figure>
398+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_girl2.png" width="400" alt="IP-Adapter image 2"/>
399+
<figcaption style="text-align: center;">IP-Adapter image 2</figcaption>
400+
</figure>
401+
</div>
402+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
403+
<figure>
404+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_attention_mask_result_seed_0.png" width="400" alt="Generated image with mask"/>
405+
<figcaption style="text-align: center;">generated with mask</figcaption>
406+
</figure>
407+
<figure>
408+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_no_attention_mask_result_seed_0.png" width="400" alt="Generated image without mask"/>
409+
<figcaption style="text-align: center;">generated without mask</figcaption>
410+
</figure>
411+
</div>
412+
</div>
413+
328414
## Applications
329415

330416
The section below covers some popular applications of IP-Adapter.
@@ -365,6 +451,17 @@ pipeline(
365451
).images[0]
366452
```
367453

454+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
455+
<figure>
456+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_einstein_base.png" width="400" alt="IP-Adapter image"/>
457+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
458+
</figure>
459+
<figure>
460+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_einstein.png" width="400" alt="generated image"/>
461+
<figcaption style="text-align: center;">generated image</figcaption>
462+
</figure>
463+
</div>
464+
368465
</hfoption>
369466
<hfoption id="h94/IP-Adapter-FaceID">
370467

@@ -473,6 +570,17 @@ style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/ma
473570
style_images = [load_image(f"{style_folder}/img{i}.png") for i in range(10)]
474571
```
475572

573+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
574+
<figure>
575+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png" width="400" alt="Face image"/>
576+
<figcaption style="text-align: center;">face image</figcaption>
577+
</figure>
578+
<figure>
579+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_style_grid.png" width="400" alt="Style images"/>
580+
<figcaption style="text-align: center;">style images</figcaption>
581+
</figure>
582+
</div>
583+
476584
Pass style and face images as a list to `ip_adapter_image`.
477585

478586
```py
@@ -485,11 +593,18 @@ pipeline(
485593
).images[0]
486594
```
487595

596+
<div style="display: flex; justify-content: center;">
597+
<figure>
598+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_multi_out.png" width="400" alt="Generated image"/>
599+
<figcaption style="text-align: center;">generated image</figcaption>
600+
</figure>
601+
</div>
602+
488603
### Instant generation
489604

490605
[Latent Consistency Models (LCM)](../api/pipelines/latent_consistency_models) can generate images 4 steps or less, unlike other diffusion models which require a lot more steps, making it feel "instantaneous". IP-Adapters are compatible with LCM models to instantly generate images.
491606

492-
Load the IP-Adapter weights and load the LoRA weights with [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights].
607+
Load the IP-Adapter weights and load the LoRA weights with [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`].
493608

494609
```py
495610
import torch
@@ -512,7 +627,7 @@ pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
512627
pipeline.enable_model_cpu_offload()
513628
```
514629

515-
Try using a lower IP-Adapter scale to condition generation more on the style you want to apply, and remember to use the special token in your prompt to trigger its generation.
630+
Try using a lower IP-Adapter scale to condition generation more on the style you want to apply and remember to use the special token in your prompt to trigger its generation.
516631

517632
```py
518633
pipeline.set_ip_adapter_scale(0.4)
@@ -528,6 +643,13 @@ pipeline(
528643
).images[0]
529644
```
530645

646+
<div style="display: flex; justify-content: center;">
647+
<figure>
648+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_herge.png" width="400" alt="Generated image"/>
649+
<figcaption style="text-align: center;">generated image</figcaption>
650+
</figure>
651+
</div>
652+
531653
### Structural control
532654

533655
For structural control, combine IP-Adapter with [ControlNet](../api/pipelines/controlnet) conditioned on depth maps, edge maps, pose estimations, and more.
@@ -567,6 +689,21 @@ pipeline(
567689
).images[0]
568690
```
569691

692+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
693+
<figure>
694+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png" width="300" alt="IP-Adapter image"/>
695+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
696+
</figure>
697+
<figure>
698+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png" width="300" alt="Depth map"/>
699+
<figcaption style="text-align: center;">depth map</figcaption>
700+
</figure>
701+
<figure>
702+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ipa-controlnet-out.png" width="300" alt="Generated image"/>
703+
<figcaption style="text-align: center;">generated image</figcaption>
704+
</figure>
705+
</div>
706+
570707
### Style and layout control
571708

572709
For style and layout control, combine IP-Adapter with [InstantStyle](https://huggingface.co/papers/2404.02733). InstantStyle separates *style* (color, texture, overall feel) and *content* from each other. It only applies the style in style-specific blocks of the model to prevent it from distorting other areas of an image. This generates images with stronger and more consistent styles and better control over the layout.
@@ -608,6 +745,17 @@ pipeline(
608745
).images[0]
609746
```
610747

748+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
749+
<figure>
750+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg" width="400" alt="Style image"/>
751+
<figcaption style="text-align: center;">style image</figcaption>
752+
</figure>
753+
<figure>
754+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_style_layout.png" width="400" alt="Generated image"/>
755+
<figcaption style="text-align: center;">generated image</figcaption>
756+
</figure>
757+
</div>
758+
611759
You can also insert the IP-Adapter in all the model layers. This tends to generate images that focus more on the image prompt and may reduce the diversity of generated images. Only activate the IP-Adapter in up `block_0` or the style layer.
612760

613761
> [!TIP]
@@ -625,4 +773,15 @@ pipeline(
625773
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
626774
guidance_scale=5,
627775
).images[0]
628-
```
776+
```
777+
778+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
779+
<figure>
780+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_style_only.png" width="400" alt="Generated image (style only)"/>
781+
<figcaption style="text-align: center;">style-layer generated image</figcaption>
782+
</figure>
783+
<figure>
784+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_ip_adapter.png" width="400" alt="Generated image (IP-Adapter only)"/>
785+
<figcaption style="text-align: center;">all layers generated image</figcaption>
786+
</figure>
787+
</div>

0 commit comments

Comments
 (0)