Skip to content

Commit 858dc09

Browse files
committed
add a readme
1 parent d1fb620 commit 858dc09

File tree

1 file changed

+69
-0
lines changed

1 file changed

+69
-0
lines changed

benchmarks/README.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Diffusers Benchmarks
2+
3+
Welcome to Diffusers Benchmarks. These benchmarks are use to obtain latency and memory information of the most popular models across different scenarios such as:
4+
5+
* Base case i.e., when using `torch.bfloat16` and `torch.nn.functional.scaled_dot_product_attention`.
6+
* Base + `torch.compile()`
7+
* NF4 quantization
8+
* Layerwise upcasting
9+
10+
Instead of full diffusion pipelines, only the forward pass of the respective model classes (such as `FluxTransformer2DModel`) is tested with the real checkpoints (such as `"black-forest-labs/FLUX.1-dev"`).
11+
12+
The entrypoint to running all the currently available benchmarks is in `run_all.py`. However, one can run the individual benchmarks, too, i.e., `python benchmarking_flux.py`. It should produce a CSV file containing various information about the benchmarks run.
13+
14+
The benchmarks are run on a weekly basis and the CI is defined in [benchmark.yml](../.github/workflows/benchmark.yml).
15+
16+
## Running the benchmarks manually
17+
18+
First set up `torch` and install `diffusers` from the root of the directory:
19+
20+
```py
21+
pip install -e ".[quality,test]"
22+
```
23+
24+
Then make sure the other dependencies are installed:
25+
26+
```sh
27+
cd benchmarks/
28+
pip install -r requirements.txt
29+
```
30+
31+
We need to be authenticated to access some of the checkpoints used during benchmarking:
32+
33+
```sh
34+
huggingface-cli login
35+
```
36+
37+
We use an L40 GPU with 128GB RAM to run the benchmark CI. As such, the benchmarks are configured to run on NVIDIA GPUs. So, make sure you have access to a similar machine (or modify the benchmarking scripts accordingly).
38+
39+
Then you can either launch the entire benchmarking suite by running:
40+
41+
```sh
42+
python run_all.py
43+
```
44+
45+
Or, you can run the individual benchmarks.
46+
47+
## Customizing the benchmarks
48+
49+
We define "scenarios" to cover the most common ways in which these models are used. You can
50+
define a new scenario, modifying an existing benchmark file:
51+
52+
```py
53+
BenchmarkScenario(
54+
name=f"{CKPT_ID}-bnb-8bit",
55+
model_cls=FluxTransformer2DModel,
56+
model_init_kwargs={
57+
"pretrained_model_name_or_path": CKPT_ID,
58+
"torch_dtype": torch.bfloat16,
59+
"subfolder": "transformer",
60+
"quantization_config": BitsAndBytesConfig(load_in_8bit=True),
61+
},
62+
get_model_input_dict=partial(get_input_dict, device=torch_device, dtype=torch.bfloat16),
63+
model_init_fn=model_init_fn,
64+
)
65+
```
66+
67+
You can also configure a new model-level benchmark and add it to the existing suite. To do so, just defining a valid benchmarking file like `benchmarking_flux.py` should be enough.
68+
69+
Happy benchmarking 🧨

0 commit comments

Comments
 (0)