@@ -12,22 +12,134 @@ specific language governing permissions and limitations under the License.
1212
1313# Stable diffusion XL
1414
15- Stable Diffusion 2 is a text-to-image _latent diffusion_ model built upon the work of [Stable Diffusion 1](https://stability.ai/blog/stable-diffusion-public-release).
16- The project to train Stable Diffusion 2 was led by Robin Rombach and Katherine Crowson from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/).
15+ Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/abs/2307.01952) by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach
1716
18- *The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels.
19- These models are trained on an aesthetic subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using [LAION’s NSFW filter](https://openreview.net/forum?id=M3Y74vmsMcY).*
17+ The abstract of the paper is the following:
2018
21- For more details about how Stable Diffusion 2 works and how it differs from Stable Diffusion 1, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-v2-release).
19+ *We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.*
2220
2321## Tips
2422
23+ - Stable Diffusion XL works especially well with images between 768 and 1024.
24+ - Stable Diffusion XL output image can be improved by making use of a refiner as shown below
25+
2526### Available checkpoints:
2627
2728- *Text-to-Image (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-base-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) with [`StableDiffusionXLPipeline`]
2829- *Image-to-Image / Refiner (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9) with [`StableDiffusionXLImg2ImgPipeline`]
2930
30- TODO
31+ ## Usage Example
32+
33+ Before using SDXL make sure to have `transformers`, `accelerate`, `safetensors` and `invisible_watermark` installed.
34+ You can install the libraries as follows:
35+
36+ ```
37+ pip install transformers
38+ pip install accelerate
39+ pip install safetensors
40+ pip install invisible-watermark >=2.0
41+ ```
42+
43+ ### *Text-to-Image*
44+
45+ You can use SDXL as follows for *text-to-image*:
46+
47+ ```py
48+ from diffusers import StableDiffusionXLPipeline
49+ import torch
50+
51+ pipe = StableDiffusionXLPipeline.from_pretrained(
52+ "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
53+ )
54+ pipe.to("cuda")
55+
56+ prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
57+ image = pipe(prompt=prompt).images[0]
58+ ```
59+
60+ ### Refining the image output
61+
62+ The image can be refined by making use of [ stabilityai/stable-diffusion-xl-refiner-0.9] ( https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9 ) .
63+ In this case, you only have to output the ` latents ` from the base model.
64+
65+ ``` py
66+ from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
67+ import torch
68+
69+ pipe = StableDiffusionXLPipeline.from_pretrained(
70+ " stabilityai/stable-diffusion-xl-base-0.9" , torch_dtype = torch.float16, variant = " fp16" , use_safetensors = True
71+ )
72+ pipe.to(" cuda" )
73+
74+ refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
75+ " stabilityai/stable-diffusion-xl-refiner-0.9" , torch_dtype = torch.float16, use_safetensors = True , variant = " fp16"
76+ )
77+ refiner.to(" cuda" )
78+
79+ prompt = " Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
80+
81+ image = pipe(prompt = prompt, output_type = " latent" if use_refiner else " pil" ).images[0 ]
82+ image = refiner(prompt = prompt, image = image[None , :]).images[0 ]
83+ ```
84+
85+ ### Loading single file checkpoitns / original file format
86+
87+ By making use of [ ` ~diffusers.loaders.FromSingleFileMixin.from_single_file ` ] you can also load the
88+ original file format into ` diffusers ` :
89+
90+ ``` py
91+ from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
92+ import torch
93+
94+ pipe = StableDiffusionXLPipeline.from_pretrained(
95+ " stabilityai/stable-diffusion-xl-base-0.9" , torch_dtype = torch.float16, variant = " fp16" , use_safetensors = True
96+ )
97+ pipe.to(" cuda" )
98+
99+ refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
100+ " stabilityai/stable-diffusion-xl-refiner-0.9" , torch_dtype = torch.float16, use_safetensors = True , variant = " fp16"
101+ )
102+ refiner.to(" cuda" )
103+ ```
104+
105+ ### Memory optimization via model offloading
106+
107+ If you are seeing out-of-memory errors, we recommend making use of [ ` StableDiffusionXLPipeline.enable_model_cpu_offload ` ] .
108+
109+ ``` diff
110+ - pipe.to("cuda")
111+ + pipe.enable_model_cpu_offload()
112+ ```
113+
114+ and
115+
116+ ``` diff
117+ - refiner.to("cuda")
118+ + refiner.enable_model_cpu_offload()
119+ ```
120+
121+ ### Speed-up inference with ` torch.compile `
122+
123+ You can speed up inference by making use of ` torch.compile ` . This should give you ** ca.** 20% speed-up.
124+
125+ ``` diff
126+ + pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
127+ + refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)
128+ ```
129+
130+ ### Running with ` torch ` < 2.0
131+
132+ ** Note** that if you want to run Stable Diffusion XL with ` torch ` < 2.0, please make sure to enable xformers
133+ attention:
134+
135+ ```
136+ pip install xformers
137+ ```
138+
139+ ``` diff
140+ + pipe.enable_xformers_memory_efficient_attention()
141+ + refiner.enable_xformers_memory_efficient_attention()
142+ ```
31143
32144## StableDiffusionXLPipeline
33145
0 commit comments