Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/diffusers/models/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -836,7 +836,7 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
param_device = "cpu"
# TODO (sayakpaul, SunMarc): remove this after model loading refactor
elif is_quant_method_bnb:
param_device = torch.cuda.current_device()
param_device = torch.device(torch.cuda.current_device())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

@yiyixuxu WDYT?

Cc: @SunMarc as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's throw an error in load_model_dict_into_meta when device is passed as index??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throws a value error now. @yiyixuxu

@sayakpaul, the integration tests pass:

(nightly-venv) (nightly-venv) aryan@hf-dgx-01:~/work/diffusers$ RUN_SLOW=1 CUDA_VISIBLE_DEVICES="3" pytest -s tests/quantization/bnb/test_4bit.py::SlowBnb4BitFluxTests
========================================================================================================================================= test session starts ==========================================================================================================================================
platform linux -- Python 3.10.14, pytest-8.3.2, pluggy-1.5.0
rootdir: /home/aryan/work/diffusers
configfile: pyproject.toml
plugins: timeout-2.3.1, requests-mock-1.10.0, xdist-3.6.1, anyio-4.6.2.post1
collected 1 item                                                                                                                                                                                                                                                                                       

tests/quantization/bnb/test_4bit.py Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Loading pipeline components...:  14%|████████████████████████████████▋                                                                                                                                                                                                    | 1/7 [00:00<00:00,  9.00it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  9.26it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:14<00:00,  1.50s/it]
.

========================================================================================================================================== 1 passed in 52.27s ===================================================================

state_dict = load_state_dict(model_file, variant=variant)
model._convert_deprecated_attention_blocks(state_dict)

Expand Down
Loading