Skip to content

Commit 5d383c3

Browse files
sayakpaulpatil-surajhysts
authored
Add: post for t2i adapters on sdxl. (#1466)
* add: draft post for t2i adapters on sdxl. * fix more. * changes * modify image paths. * update image links. * add: metadata * slightly change the title * add: thumbnail. * update table * correct paths. * reduce image dimensions. * update with paper links * Apply suggestions from code review Co-authored-by: Suraj Patil <[email protected]> * diffusers github repo. * Update the embedded app link * Update gradio version * Fix the Space URL --------- Co-authored-by: Suraj Patil <[email protected]> Co-authored-by: hysts <[email protected]>
1 parent 95d2ee3 commit 5d383c3

File tree

3 files changed

+194
-0
lines changed

3 files changed

+194
-0
lines changed

_blog.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2765,3 +2765,15 @@
27652765
- community
27662766
- research
27672767
- LLM
2768+
2769+
- local: t2i-sdxl-adapters
2770+
title: "Efficient Controllable Generation for SDXL with T2I-Adapters"
2771+
author: Adapter
2772+
guest: true
2773+
thumbnail: /blog/assets/t2i-sdxl-adapters/thumbnail.png
2774+
date: September 8, 2023
2775+
tags:
2776+
- guide
2777+
- collaboration
2778+
- diffusers
2779+
- diffusion
2.08 MB
Loading

t2i-sdxl-adapters.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
---
2+
title: "Efficient Controllable Generation for SDXL with T2I-Adapters"
3+
thumbnail: /blog/assets/t2i-sdxl-adapters/thumbnail.png
4+
authors:
5+
- user: Adapter
6+
guest: true
7+
- user: valhalla
8+
- user: sayakpaul
9+
- user: Xintao
10+
guest: true
11+
- user: hysts
12+
---
13+
14+
# Efficient Controllable Generation for SDXL with T2I-Adapters
15+
16+
<!-- {blog_metadata} -->
17+
<!-- {authors} -->
18+
19+
<p align="center">
20+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/t2i-adapters-sdxl/hf_tencent.png" height=180/>
21+
</p>
22+
23+
[T2I-Adapter](https://huggingface.co/papers/2302.08453) is an efficient plug-and-play model that provides extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models. T2I-Adapter aligns internal knowledge in T2I models with external control signals. We can train various adapters according to different conditions and achieve rich control and editing effects.
24+
25+
As a contemporaneous work, [ControlNet](https://hf.co/papers/2302.05543) has a similar function and is widely used. However, it can be **computationally expensive** to run. This is because, during each denoising step of the reverse diffusion process, both the ControlNet and UNet need to be run. In addition, ControlNet emphasizes the importance of copying the UNet encoder as a control model, resulting in a larger parameter number. Thus, the generation is bottlenecked by the size of the ControlNet (the larger, the slower the process becomes).
26+
27+
T2I-Adapters provide a competitive advantage to ControlNets in this matter. T2I-Adapters are smaller in size, and unlike ControlNets, T2I-Adapters are run just once for the entire course of the denoising process.
28+
29+
| **Model Type** | **Model Parameters** | **Storage (fp16)** |
30+
| --- | --- | --- |
31+
| [ControlNet-SDXL](https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0) | 1251 M | 2.5 GB |
32+
| [ControlLoRA](https://huggingface.co/stabilityai/control-lora) (with rank 128) | 197.78 M (84.19% reduction) | 396 MB (84.53% reduction) |
33+
| [T2I-Adapter-SDXL](https://huggingface.co/TencentARC/t2i-adapter-canny-sdxl-1.0) | 79 M (**_93.69% reduction_**) | 158 MB (**_94% reduction_**) |
34+
35+
Over the past few weeks, the Diffusers team and the T2I-Adapter authors have been collaborating to bring the support of T2I-Adapters for [Stable Diffusion XL (SDXL)](https://huggingface.co/papers/2307.01952) in [`diffusers`](https://github.com/huggingface/diffusers). In this blog post, we share our findings from training T2I-Adapters on SDXL from scratch, some appealing results, and, of course, the T2I-Adapter checkpoints on various conditionings (sketch, canny, lineart, depth, and openpose)!
36+
37+
![Collage of the results](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/t2i-adapters-sdxl/results_collage.png)
38+
39+
Compared to previous versions of T2I-Adapter (SD-1.4/1.5), [T2I-Adapter-SDXL](https://github.com/TencentARC/T2I-Adapter) still uses the original recipe, driving 2.6B SDXL with a 79M Adapter! T2I-Adapter-SDXL maintains powerful control capabilities while inheriting the high-quality generation of SDXL!
40+
41+
## Training T2I-Adapter-SDXL with `diffusers`
42+
43+
We built our training script on [this official example](https://github.com/huggingface/diffusers/blob/main/examples/t2i_adapter/README_sdxl.md) provided by `diffusers`.
44+
45+
Most of the T2I-Adapter models we mention in this blog post were trained on 3M high-resolution image-text pairs from LAION-Aesthetics V2 with the following settings:
46+
47+
- Training steps: 20000-35000
48+
- Batch size: Data parallel with a single GPU batch size of 16 for a total batch size of 128.
49+
- Learning rate: Constant learning rate of 1e-5.
50+
- Mixed precision: fp16
51+
52+
We encourage the community to use our scripts to train custom and powerful T2I-Adapters, striking a competitive trade-off between speed, memory, and quality.
53+
54+
## Using T2I-Adapter-SDXL in `diffusers`
55+
56+
Here, we take the lineart condition as an example to demonstrate the usage of [T2I-Adapter-SDXL](https://github.com/TencentARC/T2I-Adapter/tree/XL). To get started, first install the required dependencies:
57+
58+
```bash
59+
pip install -U git+https://github.com/huggingface/diffusers.git
60+
pip install -U controlnet_aux==0.0.7 # for conditioning models and detectors
61+
pip install transformers accelerate
62+
```
63+
64+
The generation process of the T2I-Adapter-SDXL mainly consists of the following two steps:
65+
66+
1. Condition images are first prepared into the appropriate *control image* format.
67+
2. The *control image* and *prompt* are passed to the [`StableDiffusionXLAdapterPipeline`](https://github.com/huggingface/diffusers/blob/0ec7a02b6a609a31b442cdf18962d7238c5be25d/src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_xl_adapter.py#L126).
68+
69+
Let's have a look at a simple example using the [Lineart Adapter](https://huggingface.co/TencentARC/t2i-adapter-lineart-sdxl-1.0). We start by initializing the T2I-Adapter pipeline for SDXL and the lineart detector.
70+
71+
```python
72+
import torch
73+
from controlnet_aux.lineart import LineartDetector
74+
from diffusers import (AutoencoderKL, EulerAncestralDiscreteScheduler,
75+
StableDiffusionXLAdapterPipeline, T2IAdapter)
76+
from diffusers.utils import load_image, make_image_grid
77+
78+
# load adapter
79+
adapter = T2IAdapter.from_pretrained(
80+
"TencentARC/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
81+
).to("cuda")
82+
83+
# load pipeline
84+
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
85+
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(
86+
model_id, subfolder="scheduler"
87+
)
88+
vae = AutoencoderKL.from_pretrained(
89+
"madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
90+
)
91+
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
92+
model_id,
93+
vae=vae,
94+
adapter=adapter,
95+
scheduler=euler_a,
96+
torch_dtype=torch.float16,
97+
variant="fp16",
98+
).to("cuda")
99+
100+
# load lineart detector
101+
line_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to("cuda")
102+
```
103+
104+
Then, load an image to detect lineart:
105+
106+
```python
107+
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg"
108+
image = load_image(url)
109+
image = line_detector(image, detect_resolution=384, image_resolution=1024)
110+
```
111+
112+
![Lineart Dragon](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/t2i-adapters-sdxl/lineart_dragon.png)
113+
114+
Then we generate:
115+
116+
```python
117+
prompt = "Ice dragon roar, 4k photo"
118+
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
119+
gen_images = pipe(
120+
prompt=prompt,
121+
negative_prompt=negative_prompt,
122+
image=image,
123+
num_inference_steps=30,
124+
adapter_conditioning_scale=0.8,
125+
guidance_scale=7.5,
126+
).images[0]
127+
gen_images.save("out_lin.png")
128+
```
129+
130+
![Lineart Generated Dragon](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/t2i-adapters-sdxl/lineart_generated_dragon.png)
131+
132+
There are two important arguments to understand that help you control the amount of conditioning.
133+
134+
1. `adapter_conditioning_scale`
135+
136+
This argument controls how much influence the conditioning should have on the input. High values mean a higher conditioning effect and vice-versa.
137+
138+
2. `adapter_conditioning_factor`
139+
140+
This argument controls how many initial generation steps should have the conditioning applied. The value should be set between 0-1 (default is 1). The value of `adapter_conditioning_factor=1` means the adapter should be applied to all timesteps, while the `adapter_conditioning_factor=0.5` means it will only applied for the first 50% of the steps.
141+
142+
For more details, we welcome you to check the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/adapter).
143+
144+
## Try out the Demo
145+
146+
You can easily try T2I-Adapter-SDXL in [this Space](https://huggingface.co/spaces/TencentARC/T2I-Adapter-SDXL) or in the playground embedded below:
147+
148+
<script type="module" src="https://gradio.s3-us-west-2.amazonaws.com/3.43.1/gradio.js"></script>
149+
<gradio-app src="https://tencentarc-t2i-adapter-sdxl.hf.space"></gradio-app>
150+
151+
## More Results
152+
153+
Below, we present results obtained from using different kinds of conditions. We also supplement the results with links to their corresponding pre-trained checkpoints. Their model cards contain more details on how they were trained, along with example usage.
154+
155+
### Lineart Guided
156+
157+
![Lineart guided more results](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/t2i-adapters-sdxl/lineart_guided.png)
158+
*Model from [`TencentARC/t2i-adapter-lineart-sdxl-1.0`](https://huggingface.co/TencentARC/t2i-adapter-lineart-sdxl-1.0)*
159+
160+
### Sketch Guided
161+
162+
![Sketch guided results](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/t2i-adapters-sdxl/sketch_guided.png)
163+
*Model from [`TencentARC/t2i-adapter-sketch-sdxl-1.0`](https://huggingface.co/TencentARC/t2i-adapter-sketch-sdxl-1.0)*
164+
165+
### Canny Guided
166+
167+
![Sketch guided results](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/t2i-adapters-sdxl/canny_guided.png)
168+
*Model from [`TencentARC/t2i-adapter-canny-sdxl-1.0`](https://huggingface.co/TencentARC/t2i-adapter-canny-sdxl-1.0)*
169+
170+
### Depth Guided
171+
172+
![Depth guided results](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/t2i-adapters-sdxl/depth_guided.png)
173+
*Depth guided models from [`TencentARC/t2i-adapter-depth-midas-sdxl-1.0`](https://huggingface.co/TencentARC/t2i-adapter-depth-midas-sdxl-1.0) and [`TencentARC/t2i-adapter-depth-zoe-sdxl-1.0`](https://huggingface.co/TencentARC/t2i-adapter-depth-zoe-sdxl-1.0) respectively*
174+
175+
### OpenPose Guided
176+
177+
![OpenPose guided results](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/t2i-adapters-sdxl/pose_guided.png)
178+
*Model from [`TencentARC/t2i-adapter-openpose-sdxl-1.0`](https://hf.co/TencentARC/t2i-adapter-openpose-sdxl-1.0)*
179+
180+
---
181+
182+
*Acknowledgements: Immense thanks to [William Berman](https://twitter.com/williamLberman) for helping us train the models and sharing his insights.*

0 commit comments

Comments
 (0)