You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+32Lines changed: 32 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,6 +37,38 @@ To install flash attention v3, follow the instructions in https://github.com/Dao
37
37
38
38
For hardware, we used a 96GB 700W H100 GPU. Some of the optimizations applied (BFloat16, torch.compile, Combining q,k,v projections, dynamic float8 quantization) are available on CPU as well.
39
39
40
+
## Run the optimized pipeline
41
+
42
+
```sh
43
+
python optimized_flux_inference.py
44
+
```
45
+
46
+
This will use Flux Schnell and will also use the AOT serialized binaries. If the binaries don't exist, they will be
47
+
automatically downloaded from [here](https://hf.co/jbschlosser/flux-fast).
Directory where we should expect to fine the AOT exported artifacts as well as the model params.
59
+
--ckpt CKPT
60
+
--prompt PROMPT
61
+
--num_inference_steps NUM_INFERENCE_STEPS
62
+
--guidance_scale GUIDANCE_SCALE
63
+
Ignored when using Schnell.
64
+
--seed SEED
65
+
--output_file OUTPUT_FILE
66
+
Output image file path
67
+
```
68
+
69
+
> [!IMPORTANT]
70
+
> The binaries won't work for hardware that are different from the ones they were obtained on. For example, if the binaries were obtained on an H100, they won't work on A100.
71
+
40
72
## Benchmarking
41
73
[`run_benchmark.py`](./run_benchmark.py) is the main script for benchmarking the different optimization techniques.
0 commit comments