finalize.

sayakpaul · sayakpaul · commit 6fd71bddcb76 · 2025-05-05T20:04:26.000+05:30
diff --git a/examples/dreambooth/README_hidream.md b/examples/dreambooth/README_hidream.md
@@ -117,3 +117,30 @@ We provide several options for optimizing memory optimization:
 * `--use_8bit_adam`: When enabled, we will use the 8bit version of AdamW provided by the `bitsandbytes` library.
 
 Refer to the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/) of the `HiDreamImagePipeline` to know more about the model.
+
+## Using quantization
+
+You can quantize the base model with [`bitsandbytes`](https://huggingface.co/docs/bitsandbytes/index) to reduce memory usage. To do so, pass a JSON file path to `--bnb_quantization_config_path`. This file should hold the configuration to initialize `BitsAndBytesConfig`. Below is an example JSON file:
+
+```json
+{
+    "load_in_4bit": true,
+    "bnb_4bit_quant_type": "nf4"
+}
+```
+
+Below, we provide some numbers with and without the use of NF4 quantization when training:
+
+```
+(with quantization)
+Memory (before device placement): 9.085089683532715 GB.
+Memory (after device placement): 34.59585428237915 GB.
+Memory (after backward): 36.90267467498779 GB.
+
+(without quantization)
+Memory (before device placement): 0.0 GB.
+Memory (after device placement): 57.6400408744812 GB.
+Memory (after backward): 59.932212829589844 GB.
+```
+
+The reason why we see some memory before device placement in the case of quantization is because, by default bnb quantized models are placed on the GPU first.
diff --git a/examples/dreambooth/train_dreambooth_lora_hidream.py b/examples/dreambooth/train_dreambooth_lora_hidream.py
@@ -1713,10 +1713,11 @@ def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):
     accelerator.wait_for_everyone()
     if accelerator.is_main_process:
         transformer = unwrap_model(transformer)
-        if args.upcast_before_saving:
-            transformer.to(torch.float32)
-        else:
-            transformer = transformer.to(weight_dtype)
+        if args.bnb_quantization_config_path is None:
+            if args.upcast_before_saving:
+                transformer.to(torch.float32)
+            else:
+                transformer = transformer.to(weight_dtype)
         transformer_lora_layers = get_peft_model_state_dict(transformer)
 
         HiDreamImagePipeline.save_lora_weights(