Skip to content

Commit e8edce2

Browse files
committed
get ready.
1 parent 19231e0 commit e8edce2

File tree

2 files changed

+52
-5
lines changed

2 files changed

+52
-5
lines changed

README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,10 @@ Making Flux go brrr on GPUs. With simple recipes from this repo, we enabled ~2.5
33

44
Check out the accompanying blog post [here](https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/).
55

6+
**Updates**
7+
8+
**June 28, 2025**: This repository now supports [Flux.1 Kontext Dev](https://hf.co/black-forest-labs/FLUX.1-Kontext-dev). We enabled ~2.5x speedup on it. Check out [this section](#flux1-kontext-dev) for more details.
9+
610
## Results
711

812
<table>
@@ -76,6 +80,7 @@ The numbers reported here were gathered using:
7680

7781
To install deps:
7882
```
83+
pip install -U huggingface_hub[hf_xet] accelerate transformers
7984
pip install -U diffusers
8085
pip install --pre torch==2.8.0.dev20250605+cu126 --index-url https://download.pytorch.org/whl/nightly/cu126
8186
pip install --pre torchao==0.12.0.dev20250609+cu126 --index-url https://download.pytorch.org/whl/nightly/cu126
@@ -154,6 +159,47 @@ mean / variance times in seconds for 10 benchmarking runs printed to STDOUT, as
154159
* A `.png` image file corresponding to the experiment (e.g. `output.png`). The path can be configured via `--output-file`.
155160
* An optional PyTorch profiler trace (e.g. `profiler_trace.json.gz`). The path can be configured via `--trace-file`
156161

162+
> [!IMPORTANT]
163+
> For benchmarking purposes, we use reasonable defaults. For example, for all the benchmarking experiments, we use
164+
> the 1024x1024 resolution. For Schnell, we use 4 denoising steps, and for Dev and Kontext, we use 28.
165+
166+
## Flux.1 Kontext Dev
167+
We ran the exact same setup as above on [Flux.1 Kontext Dev](https://hf.co/black-forest-labs/FLUX.1-Kontext-dev) and obtained the following result:
168+
169+
<div align="center">
170+
<img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux_kontext_optims.png" width=500 alt="flux_kontext_plot"/>
171+
</div>
172+
173+
Here are some example outputs for prompt `"Make Pikachu hold a sign that says 'Black Forest Labs is awesome', yarn art style, detailed, vibrant colors"` and [this image](https://huggingface.co/datasets/huggingface/documentation-images/blob/main/diffusers/yarn-art-pikachu.png):
174+
175+
<table>
176+
<thead>
177+
<tr>
178+
<th>Configuration</th>
179+
<th>Output</th>
180+
</tr>
181+
</thead>
182+
<tbody>
183+
<tr>
184+
<td><strong>Baseline</strong></td>
185+
<td><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/bf16_kontext.png" alt="baseline_output" width=400/></td>
186+
</tr>
187+
<tr>
188+
<td><strong>Fully-optimized (with quantization)</strong></td>
189+
<td><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/fully_optimized_kontext.png" alt="fast_output" width=400/></td>
190+
</tr>
191+
</tbody>
192+
</table>
193+
194+
<details>
195+
<summary><b>Notes<b></summary>
196+
197+
* You need to install `diffusers` with [this fix](https://github.com/huggingface/diffusers/pull/11818) included
198+
* You need to install `torchao` with [this fix](https://github.com/pytorch/ao/pull/2293) included
199+
* We specialized the optimizations for the 1024x1024 resolution.
200+
201+
</details>
202+
157203
## Improvements, progressively
158204
<details>
159205
<summary>Baseline</summary>

gen_image.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
11
import random
2-
import time
32
import torch
4-
from torch.profiler import profile, record_function, ProfilerActivity
5-
from utils.benchmark_utils import annotate, create_parser
3+
from utils.benchmark_utils import create_parser
64
from utils.pipeline_utils import load_pipeline # noqa: E402
7-
5+
from run_benchmark import _determine_pipe_call_kwargs
86

97
def set_rand_seeds(seed):
108
random.seed(seed)
@@ -16,7 +14,10 @@ def main(args):
1614
set_rand_seeds(args.seed)
1715

1816
image = pipeline(
19-
args.prompt, num_inference_steps=args.num_inference_steps, guidance_scale=0.0
17+
prompt=args.prompt,
18+
num_inference_steps=args.num_inference_steps,
19+
generator=torch.manual_seed(args.seed),
20+
**_determine_pipe_call_kwargs(args)
2021
).images[0]
2122
image.save(args.output_file)
2223

0 commit comments

Comments
 (0)