|
| 1 | +# Diffusers Benchmarks |
| 2 | + |
| 3 | +Welcome to Diffusers Benchmarks. These benchmarks are use to obtain latency and memory information of the most popular models across different scenarios such as: |
| 4 | + |
| 5 | +* Base case i.e., when using `torch.bfloat16` and `torch.nn.functional.scaled_dot_product_attention`. |
| 6 | +* Base + `torch.compile()` |
| 7 | +* NF4 quantization |
| 8 | +* Layerwise upcasting |
| 9 | + |
| 10 | +Instead of full diffusion pipelines, only the forward pass of the respective model classes (such as `FluxTransformer2DModel`) is tested with the real checkpoints (such as `"black-forest-labs/FLUX.1-dev"`). |
| 11 | + |
| 12 | +The entrypoint to running all the currently available benchmarks is in `run_all.py`. However, one can run the individual benchmarks, too, i.e., `python benchmarking_flux.py`. It should produce a CSV file containing various information about the benchmarks run. |
| 13 | + |
| 14 | +The benchmarks are run on a weekly basis and the CI is defined in [benchmark.yml](../.github/workflows/benchmark.yml). |
| 15 | + |
| 16 | +## Running the benchmarks manually |
| 17 | + |
| 18 | +First set up `torch` and install `diffusers` from the root of the directory: |
| 19 | + |
| 20 | +```py |
| 21 | +pip install -e ".[quality,test]" |
| 22 | +``` |
| 23 | + |
| 24 | +Then make sure the other dependencies are installed: |
| 25 | + |
| 26 | +```sh |
| 27 | +cd benchmarks/ |
| 28 | +pip install -r requirements.txt |
| 29 | +``` |
| 30 | + |
| 31 | +We need to be authenticated to access some of the checkpoints used during benchmarking: |
| 32 | + |
| 33 | +```sh |
| 34 | +huggingface-cli login |
| 35 | +``` |
| 36 | + |
| 37 | +We use an L40 GPU with 128GB RAM to run the benchmark CI. As such, the benchmarks are configured to run on NVIDIA GPUs. So, make sure you have access to a similar machine (or modify the benchmarking scripts accordingly). |
| 38 | + |
| 39 | +Then you can either launch the entire benchmarking suite by running: |
| 40 | + |
| 41 | +```sh |
| 42 | +python run_all.py |
| 43 | +``` |
| 44 | + |
| 45 | +Or, you can run the individual benchmarks. |
| 46 | + |
| 47 | +## Customizing the benchmarks |
| 48 | + |
| 49 | +We define "scenarios" to cover the most common ways in which these models are used. You can |
| 50 | +define a new scenario, modifying an existing benchmark file: |
| 51 | + |
| 52 | +```py |
| 53 | +BenchmarkScenario( |
| 54 | + name=f"{CKPT_ID}-bnb-8bit", |
| 55 | + model_cls=FluxTransformer2DModel, |
| 56 | + model_init_kwargs={ |
| 57 | + "pretrained_model_name_or_path": CKPT_ID, |
| 58 | + "torch_dtype": torch.bfloat16, |
| 59 | + "subfolder": "transformer", |
| 60 | + "quantization_config": BitsAndBytesConfig(load_in_8bit=True), |
| 61 | + }, |
| 62 | + get_model_input_dict=partial(get_input_dict, device=torch_device, dtype=torch.bfloat16), |
| 63 | + model_init_fn=model_init_fn, |
| 64 | +) |
| 65 | +``` |
| 66 | + |
| 67 | +You can also configure a new model-level benchmark and add it to the existing suite. To do so, just defining a valid benchmarking file like `benchmarking_flux.py` should be enough. |
| 68 | + |
| 69 | +Happy benchmarking 🧨 |
0 commit comments