Skip to content

Conversation

@SunMarc
Copy link
Member

@SunMarc SunMarc commented May 20, 2025

What does this PR do?

This PR fixes torchao int4 checkpoint loading. We need to load the checkpoint directly on the device we quantized it. We make the assumption that we are loading the model on the right device at the start.

Needed for this model https://huggingface.co/diffusers/FLUX.1-dev-torchao-int4

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc SunMarc requested a review from sayakpaul May 22, 2025 12:20
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some questions.

and hf_quantizer.quantization_config.quant_method == QuantizationMethod.TORCHAO
and hf_quantizer.quantization_config.quant_type in ["int4_weight_only", "autoquant"]
):
map_location = torch.device([d for d in device_map.values() if d not in ["cpu", "disk"]][0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sufficiently safe to say that we would always have a non-None device_map?

Also, what happens if the device_map has multiple CUDA devices specified? Would the indexing make sense there?

Okay for this PR but we could potentially have a resolve_map_location() per quantizer class, maybe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sufficiently safe to say that we would always have a non-None device_map?

I check that the device_map is not None. Also this should be safe enough. I took that from transformers. There shouldn't be an issue with the indexing, in any case we will move again the tensors if they are multiple index.

Yeah I can switch to update_map_location.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let's move to update_map_location().

Also, let's remove "autoquant" as we don't support it.

@sayakpaul sayakpaul requested a review from DN6 May 22, 2025 12:42
@sayakpaul
Copy link
Member

Once the PR is close to merging, let's also add a test.

@SunMarc
Copy link
Member Author

SunMarc commented May 22, 2025

Will add a test !

Note that in general, I wouldn't recommend saving int4 models with torchao as this is hardware dependent between cpu and cuda.

@sayakpaul
Copy link
Member

Note that in general, I wouldn't recommend saving int4 models with torchao as this is hardware dependent between cpu and cuda.

Indeed. Then let's also add a note in the docs

@a-r-r-o-w
Copy link
Contributor

@SunMarc @sayakpaul @DN6 Gentle ping in case this PR is still relevant

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one small comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants