Skip to content

Commit 8e4ca1b

Browse files
authored
[Docs] Update image masking and face id example (#7780)
* [Docs] Update image masking and face id example * Update docs * Fix docs
1 parent 0d2d424 commit 8e4ca1b

File tree

1 file changed

+26
-8
lines changed

1 file changed

+26
-8
lines changed

docs/source/en/using-diffusers/ip_adapter.md

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,7 @@ images = pipeline(
277277

278278
### IP-Adapter masking
279279

280-
Binary masks specify which portion of the output image should be assigned to an IP-Adapter. This is useful for composing more than one IP-Adapter image. For each input IP-Adapter image, you must provide a binary mask an an IP-Adapter.
280+
Binary masks specify which portion of the output image should be assigned to an IP-Adapter. This is useful for composing more than one IP-Adapter image. For each input IP-Adapter image, you must provide a binary mask.
281281

282282
To start, preprocess the input IP-Adapter images with the [`~image_processor.IPAdapterMaskProcessor.preprocess()`] to generate their masks. For optimal results, provide the output height and width to [`~image_processor.IPAdapterMaskProcessor.preprocess()`]. This ensures masks with different aspect ratios are appropriately stretched. If the input masks already match the aspect ratio of the generated image, you don't have to set the `height` and `width`.
283283

@@ -305,13 +305,18 @@ masks = processor.preprocess([mask1, mask2], height=output_height, width=output_
305305
</div>
306306
</div>
307307

308-
When there is more than one input IP-Adapter image, load them as a list to ensure each image is assigned to a different IP-Adapter. Each of the input IP-Adapter images here correspond to the masks generated above.
308+
When there is more than one input IP-Adapter image, load them as a list and provide the IP-Adapter scale list. Each of the input IP-Adapter images here corresponds to one of the masks generated above.
309309

310310
```py
311+
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus-face_sdxl_vit-h.safetensors"])
312+
pipeline.set_ip_adapter_scale([[0.7, 0.7]]) # one scale for each image-mask pair
313+
311314
face_image1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png")
312315
face_image2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl2.png")
313316

314-
ip_images = [[face_image1], [face_image2]]
317+
ip_images = [[face_image1, face_image2]]
318+
319+
masks = [masks.reshape(1, masks.shape[0], masks.shape[2], masks.shape[3])]
315320
```
316321

317322
<div class="flex flex-row gap-4">
@@ -328,8 +333,6 @@ ip_images = [[face_image1], [face_image2]]
328333
Now pass the preprocessed masks to `cross_attention_kwargs` in the pipeline call.
329334

330335
```py
331-
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus-face_sdxl_vit-h.safetensors"] * 2)
332-
pipeline.set_ip_adapter_scale([0.7] * 2)
333336
generator = torch.Generator(device="cpu").manual_seed(0)
334337
num_images = 1
335338

@@ -436,7 +439,7 @@ image = torch.from_numpy(faces[0].normed_embedding)
436439
ref_images_embeds.append(image.unsqueeze(0))
437440
ref_images_embeds = torch.stack(ref_images_embeds, dim=0).unsqueeze(0)
438441
neg_ref_images_embeds = torch.zeros_like(ref_images_embeds)
439-
id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda"))
442+
id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda")
440443

441444
generator = torch.Generator(device="cpu").manual_seed(42)
442445

@@ -452,13 +455,28 @@ images = pipeline(
452455
Both IP-Adapter FaceID Plus and Plus v2 models require CLIP image embeddings. You can prepare face embeddings as shown previously, then you can extract and pass CLIP embeddings to the hidden image projection layers.
453456

454457
```py
455-
clip_embeds = pipeline.prepare_ip_adapter_image_embeds([ip_adapter_images], None, torch.device("cuda"), num_images, True)[0]
458+
from insightface.utils import face_align
459+
460+
ref_images_embeds = []
461+
ip_adapter_images = []
462+
app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
463+
app.prepare(ctx_id=0, det_size=(640, 640))
464+
image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
465+
faces = app.get(image)
466+
ip_adapter_images.append(face_align.norm_crop(image, landmark=faces[0].kps, image_size=224))
467+
image = torch.from_numpy(faces[0].normed_embedding)
468+
ref_images_embeds.append(image.unsqueeze(0))
469+
ref_images_embeds = torch.stack(ref_images_embeds, dim=0).unsqueeze(0)
470+
neg_ref_images_embeds = torch.zeros_like(ref_images_embeds)
471+
id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda")
472+
473+
clip_embeds = pipeline.prepare_ip_adapter_image_embeds(
474+
[ip_adapter_images], None, torch.device("cuda"), num_images, True)[0]
456475

457476
pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16)
458477
pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = False # True if Plus v2
459478
```
460479

461-
462480
### Multi IP-Adapter
463481

464482
More than one IP-Adapter can be used at the same time to generate specific images in more diverse styles. For example, you can use IP-Adapter-Face to generate consistent faces and characters, and IP-Adapter Plus to generate those faces in a specific style.

0 commit comments

Comments
 (0)