@@ -12,208 +12,146 @@ specific language governing permissions and limitations under the License.
1212
1313# T2I-Adapter
1414
15- [ T2I-Adapter] ( https://hf.co/papers/2302.08453 ) is a lightweight adapter for controlling and providing more accurate
16- structure guidance for text-to-image models. It works by learning an alignment between the internal knowledge of the
17- text-to-image model and an external control signal, such as edge detection or depth estimation.
15+ [ T2I-Adapter] ( https://huggingface.co/papers/2302.08453 ) is an adapter that enables controllable generation like [ ControlNet] ( ./controlnet ) . A T2I-Adapter works by learning a * mapping* between a control signal (for example, a depth map) and a pretrained model's internal knowledge. The adapter is plugged in to the base model to provide extra guidance based on the control signal during generation.
1816
19- The T2I-Adapter design is simple, the condition is passed to four feature extraction blocks and three downsample
20- blocks. This makes it fast and easy to train different adapters for different conditions which can be plugged into the
21- text-to-image model. T2I-Adapter is similar to [ ControlNet] ( controlnet ) except it is smaller (~ 77M parameters) and
22- faster because it only runs once during the diffusion process. The downside is that performance may be slightly worse
23- than ControlNet.
24-
25- This guide will show you how to use T2I-Adapter with different Stable Diffusion models and how you can compose multiple
26- T2I-Adapters to impose more than one condition.
27-
28- > [ !TIP]
29- > There are several T2I-Adapters available for different conditions, such as color palette, depth, sketch, pose, and
30- > segmentation. Check out the [ TencentARC] ( https://hf.co/TencentARC ) repository to try them out!
31-
32- Before you begin, make sure you have the following libraries installed.
17+ Load a T2I-Adapter conditioned on a specific control, such as canny edge, and pass it to the pipeline in [ ` ~DiffusionPipeline.from_pretrained ` ] .
3318
3419``` py
35- # uncomment to install the necessary libraries in Colab
36- # !pip install -q diffusers accelerate controlnet-aux==0.0.7
37- ```
38-
39- ## Text-to-image
40-
41- Text-to-image models rely on a prompt to generate an image, but sometimes, text alone may not be enough to provide more
42- accurate structural guidance. T2I-Adapter allows you to provide an additional control image to guide the generation
43- process. For example, you can provide a canny image (a white outline of an image on a black background) to guide the
44- model to generate an image with a similar structure.
20+ import torch
21+ from diffusers import T2IAdapter, StableDiffusionXLAdapterPipeline, AutoencoderKL
4522
46- <hfoptions id =" stablediffusion " >
47- <hfoption id =" Stable Diffusion 1.5 " >
23+ t2i_adapter = T2IAdapter.from_pretrained(
24+ " TencentARC/t2i-adapter-canny-sdxl-1.0" ,
25+ torch_dtype = torch.float16,
26+ )
27+ ```
4828
49- Create a canny image with the [ opencv-library ] ( https://github.com/opencv/opencv-python ) .
29+ Generate a canny image with [ opencv-python ] ( https://github.com/opencv/opencv-python ) .
5030
5131``` py
5232import cv2
5333import numpy as np
5434from PIL import Image
5535from diffusers.utils import load_image
5636
57- image = load_image(" https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png" )
58- image = np.array(image)
37+ original_image = load_image(
38+ " https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png"
39+ )
40+
41+ image = np.array(original_image)
5942
6043low_threshold = 100
6144high_threshold = 200
6245
6346image = cv2.Canny(image, low_threshold, high_threshold)
64- image = Image.fromarray(image)
65- ```
66-
67- Now load a T2I-Adapter conditioned on [ canny images] ( https://hf.co/TencentARC/t2iadapter_canny_sd15v2 ) and pass it to
68- the [ ` StableDiffusionAdapterPipeline ` ] .
69-
70- ``` py
71- import torch
72- from diffusers import StableDiffusionAdapterPipeline, T2IAdapter
73-
74- adapter = T2IAdapter.from_pretrained(" TencentARC/t2iadapter_canny_sd15v2" , torch_dtype = torch.float16)
75- pipeline = StableDiffusionAdapterPipeline.from_pretrained(
76- " stable-diffusion-v1-5/stable-diffusion-v1-5" ,
77- adapter = adapter,
78- torch_dtype = torch.float16,
79- )
80- pipeline.to(" cuda" )
81- ```
82-
83- Finally, pass your prompt and control image to the pipeline.
84-
85- ``` py
86- generator = torch.Generator(" cuda" ).manual_seed(0 )
87-
88- image = pipeline(
89- prompt = " cinematic photo of a plush and soft midcentury style rug on a wooden floor, 35mm photograph, film, professional, 4k, highly detailed" ,
90- image = image,
91- generator = generator,
92- ).images[0 ]
93- image
94- ```
95-
96- <div class =" flex justify-center " >
97- <img class =" rounded-xl " src =" https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/t2i-sd1.5.png " />
98- </div >
99-
100- </hfoption >
101- <hfoption id =" Stable Diffusion XL " >
102-
103- Create a canny image with the [ controlnet-aux] ( https://github.com/huggingface/controlnet_aux ) library.
104-
105- ``` py
106- from controlnet_aux.canny import CannyDetector
107- from diffusers.utils import load_image
108-
109- canny_detector = CannyDetector()
110-
111- image = load_image(" https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png" )
112- image = canny_detector(image, detect_resolution = 384 , image_resolution = 1024 )
47+ image = image[:, :, None ]
48+ image = np.concatenate([image, image, image], axis = 2 )
49+ canny_image = Image.fromarray(image)
11350```
11451
115- Now load a T2I-Adapter conditioned on [ canny images] ( https://hf.co/TencentARC/t2i-adapter-canny-sdxl-1.0 ) and pass it
116- to the [ ` StableDiffusionXLAdapterPipeline ` ] .
52+ Pass the canny image to the pipeline to generate an image.
11753
11854``` py
119- import torch
120- from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler, AutoencoderKL
121-
122- scheduler = EulerAncestralDiscreteScheduler.from_pretrained(" stabilityai/stable-diffusion-xl-base-1.0" , subfolder = " scheduler" )
12355vae = AutoencoderKL.from_pretrained(" madebyollin/sdxl-vae-fp16-fix" , torch_dtype = torch.float16)
124- adapter = T2IAdapter.from_pretrained(" TencentARC/t2i-adapter-canny-sdxl-1.0" , torch_dtype = torch.float16)
12556pipeline = StableDiffusionXLAdapterPipeline.from_pretrained(
12657 " stabilityai/stable-diffusion-xl-base-1.0" ,
127- adapter = adapter ,
58+ adapter = t2i_adapter ,
12859 vae = vae,
129- scheduler = scheduler,
13060 torch_dtype = torch.float16,
131- variant = " fp16" ,
132- )
133- pipeline.to(" cuda" )
134- ```
135-
136- Finally, pass your prompt and control image to the pipeline.
61+ ).to(" cuda" )
13762
138- ``` py
139- generator = torch.Generator(" cuda" ).manual_seed(0 )
63+ prompt = """
64+ A photorealistic overhead image of a cat reclining sideways in a flamingo pool floatie holding a margarita.
65+ The cat is floating leisurely in the pool and completely relaxed and happy.
66+ """
14067
141- image = pipeline(
142- prompt = " cinematic photo of a plush and soft midcentury style rug on a wooden floor, 35mm photograph, film, professional, 4k, highly detailed" ,
143- image = image,
144- generator = generator,
68+ pipeline(
69+ prompt,
70+ image = canny_image,
71+ num_inference_steps = 100 ,
72+ guidance_scale = 10 ,
14573).images[0 ]
146- image
14774```
14875
149- <div class =" flex justify-center " >
150- <img class =" rounded-xl " src =" https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/t2i-sdxl.png " />
76+ <div style =" display : flex ; gap : 10px ; justify-content : space-around ; align-items : flex-end ;" >
77+ <figure >
78+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png" width="300" alt="Generated image (prompt only)"/>
79+ <figcaption style="text-align: center;">original image</figcaption>
80+ </figure >
81+ <figure >
82+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png" width="300" alt="Control image (Canny edges)"/>
83+ <figcaption style="text-align: center;">canny image</figcaption>
84+ </figure >
85+ <figure >
86+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/t2i-canny-cat-generated.png" width="300" alt="Generated image (ControlNet + prompt)"/>
87+ <figcaption style="text-align: center;">generated image</figcaption>
88+ </figure >
15189</div >
15290
153- </hfoption >
154- </hfoptions >
155-
15691## MultiAdapter
15792
158- T2I-Adapters are also composable, allowing you to use more than one adapter to impose multiple control conditions on an
159- image. For example, you can use a pose map to provide structural control and a depth map for depth control. This is
160- enabled by the [ ` MultiAdapter ` ] class.
93+ You can compose multiple controls, such as canny image and a depth map, with the [ ` MultiAdapter ` ] class.
16194
162- Let's condition a text-to-image model with a pose and depth adapter. Create and place your depth and pose image and in a list.
95+ The example below composes a canny image and depth map.
96+
97+ Load the control images and T2I-Adapters as a list.
16398
16499``` py
100+ import torch
165101from diffusers.utils import load_image
102+ from diffusers import StableDiffusionXLAdapterPipeline, AutoencoderKL, MultiAdapter, T2IAdapter
166103
167- pose_image = load_image(
168- " https://huggingface.co/datasets/diffusers/docs -images/resolve/main/t2i-adapter/keypose_sample_input .png"
104+ canny_image = load_image(
105+ " https://huggingface.co/datasets/huggingface/documentation -images/resolve/main/diffusers/canny-cat .png"
169106)
170107depth_image = load_image(
171- " https://huggingface.co/datasets/diffusers/docs -images/resolve/main/t2i-adapter/depth_sample_input .png"
108+ " https://huggingface.co/datasets/huggingface/documentation -images/resolve/main/diffusers/sdxl_depth_image .png"
172109)
173- cond = [pose_image, depth_image]
174- prompt = [" Santa Claus walking into an office room with a beautiful city view" ]
175- ```
176-
177- <div class =" flex gap-4 " >
178- <div >
179- <img class="rounded-xl" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_input.png"/>
180- <figcaption class="mt-2 text-center text-sm text-gray-500">depth image</figcaption>
181- </div >
182- <div >
183- <img class="rounded-xl" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_input.png"/>
184- <figcaption class="mt-2 text-center text-sm text-gray-500">pose image</figcaption>
185- </div >
186- </div >
187-
188- Load the corresponding pose and depth adapters as a list in the [ ` MultiAdapter ` ] class.
189-
190- ``` py
191- import torch
192- from diffusers import StableDiffusionAdapterPipeline, MultiAdapter, T2IAdapter
110+ controls = [canny_image, depth_image]
111+ prompt = ["""
112+ a relaxed rabbit sitting on a striped towel next to a pool with a tropical drink nearby,
113+ bright sunny day, vacation scene, 35mm photograph, film, professional, 4k, highly detailed
114+ """ ]
193115
194116adapters = MultiAdapter(
195117 [
196- T2IAdapter.from_pretrained(" TencentARC/t2iadapter_keypose_sd14v1 " ),
197- T2IAdapter.from_pretrained(" TencentARC/t2iadapter_depth_sd14v1 " ),
118+ T2IAdapter.from_pretrained(" TencentARC/t2i-adapter-canny-sdxl-1.0 " , torch_dtype = torch.float16 ),
119+ T2IAdapter.from_pretrained(" TencentARC/t2i-adapter-depth-midas-sdxl-1.0 " , torch_dtype = torch.float16 ),
198120 ]
199121)
200- adapters = adapters.to(torch.float16)
201122```
202123
203- Finally, load a [ ` StableDiffusionAdapterPipeline ` ] with the adapters, and pass your prompt and conditioned images to
204- it. Use the [ ` adapter_conditioning_scale ` ] to adjust the weight of each adapter on the image.
124+ Pass the adapters, prompt, and control images to [ ` StableDiffusionXLAdapterPipeline ` ] . Use the ` adapter_conditioning_scale ` parameter to determine how much weight to assign to each control.
205125
206126``` py
207- pipeline = StableDiffusionAdapterPipeline.from_pretrained(
208- " CompVis/stable-diffusion-v1-4" ,
127+ vae = AutoencoderKL.from_pretrained(" madebyollin/sdxl-vae-fp16-fix" , torch_dtype = torch.float16)
128+ pipeline = StableDiffusionXLAdapterPipeline.from_pretrained(
129+ " stabilityai/stable-diffusion-xl-base-1.0" ,
209130 torch_dtype = torch.float16,
131+ vae = vae,
210132 adapter = adapters,
211133).to(" cuda" )
212134
213- image = pipeline(prompt, cond, adapter_conditioning_scale = [0.7 , 0.7 ]).images[0 ]
214- image
135+ pipeline(
136+ prompt,
137+ image = controls,
138+ height = 1024 ,
139+ width = 1024 ,
140+ adapter_conditioning_scale = [0.7 , 0.7 ]
141+ ).images[0 ]
215142```
216143
217- <div class =" flex justify-center " >
218- <img class =" rounded-xl " src =" https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/t2i-multi.png " />
144+ <div style =" display : flex ; gap : 10px ; justify-content : space-around ; align-items : flex-end ;" >
145+ <figure >
146+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png" width="300" alt="Generated image (prompt only)"/>
147+ <figcaption style="text-align: center;">canny image</figcaption>
148+ </figure >
149+ <figure >
150+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_depth_image.png" width="300" alt="Control image (Canny edges)"/>
151+ <figcaption style="text-align: center;">depth map</figcaption>
152+ </figure >
153+ <figure >
154+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/t2i-multi-rabbbit.png" width="300" alt="Generated image (ControlNet + prompt)"/>
155+ <figcaption style="text-align: center;">generated image</figcaption>
156+ </figure >
219157</div >
0 commit comments