Skip to content

Commit 5029673

Browse files
JY-JoysayakpaulResearcherXmanyiyixuxu
authored
Update InstantStyle usage in IP-Adapter documentation (#7806)
* enable control ip-adapter per-transformer block on-the-fly --------- Co-authored-by: sayakpaul <[email protected]> Co-authored-by: ResearcherXman <[email protected]> Co-authored-by: YiYi Xu <[email protected]>
1 parent 56bd7e6 commit 5029673

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

docs/source/en/using-diffusers/ip_adapter.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -661,16 +661,16 @@ image
661661

662662
### Style & layout control
663663

664-
[InstantStyle](https://arxiv.org/abs/2404.02733) is a plug-and-play method on top of IP-Adapter, which disentangles style and layout from image prompt to control image generation. This is achieved by only inserting IP-Adapters to some specific part of the model.
664+
[InstantStyle](https://arxiv.org/abs/2404.02733) is a plug-and-play method on top of IP-Adapter, which disentangles style and layout from image prompt to control image generation. This way, you can generate images following only the style or layout from image prompt, with significantly improved diversity. This is achieved by only activating IP-Adapters to specific parts of the model.
665665

666666
By default IP-Adapters are inserted to all layers of the model. Use the [`~loaders.IPAdapterMixin.set_ip_adapter_scale`] method with a dictionary to assign scales to IP-Adapter at different layers.
667667

668668
```py
669-
from diffusers import AutoPipelineForImage2Image
669+
from diffusers import AutoPipelineForText2Image
670670
from diffusers.utils import load_image
671671
import torch
672672

673-
pipeline = AutoPipelineForImage2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
673+
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
674674
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
675675

676676
scale = {
@@ -680,15 +680,15 @@ scale = {
680680
pipeline.set_ip_adapter_scale(scale)
681681
```
682682

683-
This will activate IP-Adapter at the second layer in the model's down-part block 2 and up-part block 0. The former is the layer where IP-Adapter injects layout information and the latter injects style. Inserting IP-Adapter to these two layers you can generate images following the style and layout of image prompt, but with contents more aligned to text prompt.
683+
This will activate IP-Adapter at the second layer in the model's down-part block 2 and up-part block 0. The former is the layer where IP-Adapter injects layout information and the latter injects style. Inserting IP-Adapter to these two layers you can generate images following both the style and layout from image prompt, but with contents more aligned to text prompt.
684684

685685
```py
686686
style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
687687

688-
generator = torch.Generator(device="cpu").manual_seed(42)
688+
generator = torch.Generator(device="cpu").manual_seed(26)
689689
image = pipeline(
690690
prompt="a cat, masterpiece, best quality, high quality",
691-
image=style_image,
691+
ip_adapter_image=style_image,
692692
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
693693
guidance_scale=5,
694694
num_inference_steps=30,
@@ -703,7 +703,7 @@ image
703703
<figcaption class="mt-2 text-center text-sm text-gray-500">IP-Adapter image</figcaption>
704704
</div>
705705
<div class="flex-1">
706-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit_style_layout_cat.png"/>
706+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_style_layout.png"/>
707707
<figcaption class="mt-2 text-center text-sm text-gray-500">generated image</figcaption>
708708
</div>
709709
</div>
@@ -718,10 +718,10 @@ scale = {
718718
}
719719
pipeline.set_ip_adapter_scale(scale)
720720

721-
generator = torch.Generator(device="cpu").manual_seed(42)
721+
generator = torch.Generator(device="cpu").manual_seed(26)
722722
image = pipeline(
723723
prompt="a cat, masterpiece, best quality, high quality",
724-
image=style_image,
724+
ip_adapter_image=style_image,
725725
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
726726
guidance_scale=5,
727727
num_inference_steps=30,
@@ -732,11 +732,11 @@ image
732732

733733
<div class="flex flex-row gap-4">
734734
<div class="flex-1">
735-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit_style_cat.png"/>
735+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_style_only.png"/>
736736
<figcaption class="mt-2 text-center text-sm text-gray-500">IP-Adapter only in style layer</figcaption>
737737
</div>
738738
<div class="flex-1">
739-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/30518dfe089e6bf50008875077b44cb98fb2065c/diffusers/default_out.png"/>
739+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_ip_adapter.png"/>
740740
<figcaption class="mt-2 text-center text-sm text-gray-500">IP-Adapter in all layers</figcaption>
741741
</div>
742742
</div>

0 commit comments

Comments
 (0)