Skip to content

Commit f825221

Browse files
[Community Pipeline] IPAdapter FaceID (#6276)
* Add support for IPAdapter FaceID * Add docs * Move subfolder to kwargs * Fix quality * Fix image encoder loading * Fix loading + add test * Move to community folder * Fix style * Revert constant update --------- Co-authored-by: Sayak Paul <[email protected]>
1 parent 119d734 commit f825221

File tree

2 files changed

+1586
-1
lines changed

2 files changed

+1586
-1
lines changed

examples/community/README.md

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ prompt-to-prompt | change parts of a prompt and retain image structure (see [pap
5858
| Null-Text Inversion Pipeline | Implement [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://arxiv.org/abs/2211.09794) as a pipeline. | [Null-Text Inversion](https://github.com/google/prompt-to-prompt/) | - | [Junsheng Luan](https://github.com/Junsheng121) |
5959
| Rerender A Video Pipeline | Implementation of [[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation](https://arxiv.org/abs/2306.07954) | [Rerender A Video Pipeline](#Rerender_A_Video) | - | [Yifan Zhou](https://github.com/SingleZombie) |
6060
| StyleAligned Pipeline | Implementation of [Style Aligned Image Generation via Shared Attention](https://arxiv.org/abs/2312.02133) | [StyleAligned Pipeline](#stylealigned-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/15X2E0jFPTajUIjS0FzX50OaHsCbP2lQ0/view?usp=sharing) | [Aryan V S](https://github.com/a-r-r-o-w) |
61+
| IP Adapter FaceID Stable Diffusion | Stable Diffusion Pipeline that supports IP Adapter Face ID | [IP Adapter Face ID](#ip-adapter-face-id) | - | [Fabio Rigano](https://github.com/fabiorigano) |
6162

6263
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
6364
```py
@@ -3406,4 +3407,63 @@ images = pipe(
34063407

34073408
# Disable StyleAligned if you do not wish to use it anymore
34083409
pipe.disable_style_aligned()
3409-
```
3410+
```
3411+
3412+
### IP Adapter Face ID
3413+
IP Adapter FaceID is an experimental IP Adapter model that uses image embeddings generated by `insightface`, so no image encoder needs to be loaded.
3414+
You need to install `insightface` and all its requirements to use this model.
3415+
You must pass the image embedding tensor as `image_embeds` to the StableDiffusionPipeline instead of `ip_adapter_image`.
3416+
You have to disable PEFT BACKEND in order to load weights.
3417+
3418+
```py
3419+
import diffusers
3420+
diffusers.utils.USE_PEFT_BACKEND = False
3421+
import torch
3422+
from diffusers.utils import load_image
3423+
import cv2
3424+
import numpy as np
3425+
from diffusers import DiffusionPipeline, AutoencoderKL, DDIMScheduler
3426+
from insightface.app import FaceAnalysis
3427+
3428+
3429+
noise_scheduler = DDIMScheduler(
3430+
num_train_timesteps=1000,
3431+
beta_start=0.00085,
3432+
beta_end=0.012,
3433+
beta_schedule="scaled_linear",
3434+
clip_sample=False,
3435+
set_alpha_to_one=False,
3436+
steps_offset=1,
3437+
)
3438+
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse").to(dtype=torch.float16)
3439+
pipeline = DiffusionPipeline.from_pretrained(
3440+
"SG161222/Realistic_Vision_V4.0_noVAE",
3441+
torch_dtype=torch.float16,
3442+
scheduler=noise_scheduler,
3443+
vae=vae,
3444+
custom_pipeline="ip_adapter_face_id"
3445+
)
3446+
pipeline.load_ip_adapter_face_id("h94/IP-Adapter-FaceID", "ip-adapter-faceid_sd15.bin")
3447+
pipeline.to("cuda")
3448+
3449+
generator = torch.Generator(device="cpu").manual_seed(42)
3450+
num_images=2
3451+
3452+
image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ai_face2.png")
3453+
3454+
app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
3455+
app.prepare(ctx_id=0, det_size=(640, 640))
3456+
image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
3457+
faces = app.get(image)
3458+
image = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
3459+
images = pipeline(
3460+
prompt="A photo of a girl wearing a black dress, holding red roses in hand, upper body, behind is the Eiffel Tower",
3461+
image_embeds=image,
3462+
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
3463+
num_inference_steps=20, num_images_per_prompt=num_images, width=512, height=704,
3464+
generator=generator
3465+
).images
3466+
3467+
for i in range(num_images):
3468+
images[i].save(f"c{i}.png")
3469+
```

0 commit comments

Comments
 (0)