@@ -25,22 +25,37 @@ quite a bit faster.
2525
2626Here are some example outputs for prompt ` "A cat playing with a ball of yarn" ` :
2727
28- | Configuration | Output |
29- | --------------------------------------------| ----------------------------------------------------------------------------------------------------------------------------------------------------|
30- | ** Baseline** | ![ baseline_output] ( https://github.com/user-attachments/assets/8ba746d2-fbf3-4e30-adc4-11303231c146 ) |
31- | ** Fully-optimized (with quantization)** | ![ fast_output] ( https://github.com/user-attachments/assets/1a31dec4-38d5-45b2-8ae6-c7fb2e6413a4 ) |
32-
28+ <table >
29+ <thead >
30+ <tr>
31+ <th>Configuration</th>
32+ <th>Output</th>
33+ </tr>
34+ </thead >
35+ <tbody >
36+ <tr>
37+ <td><strong>Baseline</strong></td>
38+ <td><img src="https://github.com/user-attachments/assets/8ba746d2-fbf3-4e30-adc4-11303231c146" alt="baseline_output" width=400/></td>
39+ </tr>
40+ <tr>
41+ <td><strong>Fully-optimized (with quantization)</strong></td>
42+ <td><img src="https://github.com/user-attachments/assets/1a31dec4-38d5-45b2-8ae6-c7fb2e6413a4" alt="fast_output" width=400/></td>
43+ </tr>
44+ </tbody >
45+ </table >
3346
3447## Setup
3548We rely primarily on pure PyTorch for the optimizations. Currently, a relatively recent nightly version of PyTorch is required.
49+
3650The numbers reported here were gathered using:
3751* ` torch==2.8.0.dev20250605+cu126 ` - note that we rely on some fixes since 2.7
3852* ` torchao==0.12.0.dev20250610+cu126 ` - note that we rely on a fix in the 06/10 nightly
39- * ` diffusers==0.33.1 `
53+ * ` diffusers ` - with [ this fix ] ( https://github.com/huggingface/diffusers/pull/11696 ) included
4054* ` flash_attn_3==3.0.0b1 `
4155
4256To install deps:
4357```
58+ pip uninstall diffusers -y && pip install git+https://github.com/huggingface/diffusers@b272807bc898a314cde536c1d7d1e43592af1fce
4459pip install --pre torch==2.8.0.dev20250605+cu126 --index-url https://download.pytorch.org/whl/nightly/cu126
4560pip install --pre torchao==0.12.0.dev20250609+cu126 --index-url https://download.pytorch.org/whl/nightly/cu126
4661pip install diffusers==0.33.1
@@ -52,7 +67,7 @@ For hardware, we used a 96GB 700W H100 GPU. Some of the optimizations applied (B
5267
5368## Run the optimized pipeline
5469
55- ```
70+ ``` sh
5671python gen_image.py --prompt " An astronaut standing next to a giant lemon" --output-file output.png --use-cached-model
5772```
5873
0 commit comments