Use torch.device instead of current device index for BnB quantizer #10069

sayakpaul · 2024-12-01T15:19:39Z

This looks good to me!

@yiyixuxu WDYT?

Cc: @SunMarc as well.

let's throw an error in load_model_dict_into_meta when device is passed as index??

Throws a value error now. @yiyixuxu

@sayakpaul, the integration tests pass:

(nightly-venv) (nightly-venv) aryan@hf-dgx-01:~/work/diffusers$ RUN_SLOW=1 CUDA_VISIBLE_DEVICES="3" pytest -s tests/quantization/bnb/test_4bit.py::SlowBnb4BitFluxTests ========================================================================================================================================= test session starts ========================================================================================================================================== platform linux -- Python 3.10.14, pytest-8.3.2, pluggy-1.5.0 rootdir: /home/aryan/work/diffusers configfile: pyproject.toml plugins: timeout-2.3.1, requests-mock-1.10.0, xdist-3.6.1, anyio-4.6.2.post1 collected 1 item tests/quantization/bnb/test_4bit.py Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. `low_cpu_mem_usage` was None, now default to True since model is quantized. Loading pipeline components...: 14%|████████████████████████████████▋ | 1/7 [00:00<00:00, 9.00it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 9.26it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:14<00:00, 1.50s/it] . ========================================================================================================================================== 1 passed in 52.27s ===================================================================

-Original file line number
+Diff line change
@@ Expand Up @@
                             param_device = "cpu"
                         # TODO (sayakpaul,  SunMarc): remove this after model loading refactor
                         elif is_quant_method_bnb:
-                            param_device = torch.cuda.current_device()
+                            param_device = torch.device(torch.cuda.current_device())
                         state_dict = load_state_dict(model_file, variant=variant)
                         model._convert_deprecated_attention_blocks(state_dict)
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use torch.device instead of current device index for BnB quantizer #10069

Uh oh!

Diff view

Diff view

There are no files selected for viewing

sayakpaul Dec 1, 2024

Uh oh!

yiyixuxu Dec 1, 2024

Uh oh!

a-r-r-o-w Dec 4, 2024

Uh oh!

Uh oh!

Uh oh!

Use torch.device instead of current device index for BnB quantizer #10069

Uh oh!

Use torch.device instead of current device index for BnB quantizer #10069

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

sayakpaul Dec 1, 2024

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Dec 1, 2024

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!