|  | 
|  | 1 | +<!--Copyright 2024 The HuggingFace Team. All rights reserved. | 
|  | 2 | +
 | 
|  | 3 | +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | 
|  | 4 | +the License. You may obtain a copy of the License at | 
|  | 5 | +
 | 
|  | 6 | +http://www.apache.org/licenses/LICENSE-2.0 | 
|  | 7 | +
 | 
|  | 8 | +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | 
|  | 9 | +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | 
|  | 10 | +specific language governing permissions and limitations under the License. | 
|  | 11 | +
 | 
|  | 12 | +--> | 
|  | 13 | + | 
|  | 14 | +# GGUF | 
|  | 15 | + | 
|  | 16 | +The GGUF file format is typically used to store models for inference with [GGML](https://github.com/ggerganov/ggml) and supports a variety of block wise quantization options. Diffusers supports loading checkpoints prequantized and saved in the GGUF format via `from_single_file` loading with Model classes. Loading GGUF checkpoints via Pipelines is currently not supported. | 
|  | 17 | + | 
|  | 18 | +The following example will load the [FLUX.1 DEV](https://huggingface.co/black-forest-labs/FLUX.1-dev) transformer model using the GGUF Q2_K quantization variant. | 
|  | 19 | + | 
|  | 20 | +Before starting please install gguf in your environment | 
|  | 21 | + | 
|  | 22 | +```shell | 
|  | 23 | +pip install -U gguf | 
|  | 24 | +``` | 
|  | 25 | + | 
|  | 26 | +Since GGUF is a single file format, use [`~FromSingleFileMixin.from_single_file`] to load the model and pass in the [`GGUFQuantizationConfig`]. | 
|  | 27 | + | 
|  | 28 | +When using GGUF checkpoints, the quantized weights remain in a low memory `dtype`(typically `torch.unint8`) and are dynamically dequantized and cast to the configured `compute_dtype` during each module's forward pass through the model. The `GGUFQuantizationConfig` allows you to set the `compute_dtype`.  | 
|  | 29 | + | 
|  | 30 | +The functions used for dynamic dequantizatation are based on the great work done by [city96](https://github.com/city96/ComfyUI-GGUF), who created the Pytorch ports of the original (`numpy`)[https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/quants.py] implementation by [compilade](https://github.com/compilade). | 
|  | 31 | + | 
|  | 32 | +```python | 
|  | 33 | +import torch | 
|  | 34 | + | 
|  | 35 | +from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig | 
|  | 36 | + | 
|  | 37 | +ckpt_path = ( | 
|  | 38 | +    "https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-Q2_K.gguf" | 
|  | 39 | +) | 
|  | 40 | +transformer = FluxTransformer2DModel.from_single_file( | 
|  | 41 | +    ckpt_path, | 
|  | 42 | +    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16), | 
|  | 43 | +    torch_dtype=torch.bfloat16, | 
|  | 44 | +) | 
|  | 45 | +pipe = FluxPipeline.from_pretrained( | 
|  | 46 | +    "black-forest-labs/FLUX.1-dev", | 
|  | 47 | +    transformer=transformer, | 
|  | 48 | +    generator=torch.manual_seed(0), | 
|  | 49 | +    torch_dtype=torch.bfloat16, | 
|  | 50 | +) | 
|  | 51 | +pipe.enable_model_cpu_offload() | 
|  | 52 | +prompt = "A cat holding a sign that says hello world" | 
|  | 53 | +image = pipe(prompt).images[0] | 
|  | 54 | +image.save("flux-gguf.png") | 
|  | 55 | +``` | 
|  | 56 | + | 
|  | 57 | +## Supported Quantization Types | 
|  | 58 | + | 
|  | 59 | +- BF16 | 
|  | 60 | +- Q4_0 | 
|  | 61 | +- Q4_1 | 
|  | 62 | +- Q5_0 | 
|  | 63 | +- Q5_1 | 
|  | 64 | +- Q8_0 | 
|  | 65 | +- Q2_K | 
|  | 66 | +- Q3_K | 
|  | 67 | +- Q4_K | 
|  | 68 | +- Q5_K | 
|  | 69 | +- Q6_K | 
|  | 70 | + | 
0 commit comments