Skip to content

Conversation

@Wauplin
Copy link
Collaborator

@Wauplin Wauplin commented Dec 10, 2024

This PR adds a workaround to load transformers components without passing DDUF entries to transformers.

As mentioned with @LysandreJik on slack (private), modifying transformers internals to load from a DDUF archive (huggingface/transformers#35093) is going to be hard to scale. Also it makes transformers internals heavily depends on diffusers's DDUF format which we don't even know yet if it will be adopted by the community ( 🤞 ). On slack we briefly discussed a way to update transformers in such a way that loading from a folder or a zip directory works the same but even that is a hazardous non-trivial change. After further discussions, we ended up suggesting a solution balancing between loading speed and maintainability:

  • for tokenizers components, we will unzip the archive to a temporary directory and load from there => it's not the most optimized way I/O-wise but given tokenizers are small-ish in general, it shouldn't impact users so badly
  • for transformers models, we will 1. load the config 2. instantiate an empty model from the config 3. load the weights manually from diffusers

This way we keep the heavy load in diffusers and are therefore able to do it in an optimized way (i.e. directly from the DDUF file).

This PR introduces the required changes for this to work. I did not handle all the low-level kwargs (torch_dtype, device_map, max_memory, offload_folder, offload_state_dict, use_safetensors, low_cpu_mem_usage) since it's not my area of expertise but I believe it shouldn't cause problem.

I also took the opportunity to set huggingface_hub 0.27.x as the lowest required dependency to benefit from the dduf helpers directly. Release is not out yet but will be before we merge the "big dduf PR" in diffusers.

What I suggest now is:

@Wauplin Wauplin mentioned this pull request Dec 10, 2024
4 tasks
Comment on lines 87 to 92
# TODO: use kwargs (torch_dtype, device_map, max_memory, offload_folder, offload_state_dict, use_safetensors, low_cpu_mem_usage)
# TODO: is there something else we should take care of?
for entry in weight_files:
with entry.as_mmap() as mm:
state_dict = safetensors.torch.load(mm)
model.load_state_dict(state_dict)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioned by @LysandreJik better to use model.from_pretrained(state_dict)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this won't work with quantization otherwise as we perform some changes to the model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 percent should be replaced with from_pretrained().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested a solution here: #10171 (comment). Waiting for confirmation before moving forward

Copy link
Contributor

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice!

)
config = PretrainedConfig(**json.loads(config_file.read_text()))

model = cls(config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few things here:

  • this will go through the slow initialization of all modules
  • we should init the model on meta, then assign the weights, with load_state_dict(..., assing=True)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should first initialize the model under meta at all as we will be leveraging from_pretrained() anyway.

See: https://github.com/huggingface/diffusers/pull/10171/files#r1878051478

@Wauplin
Copy link
Collaborator Author

Wauplin commented Dec 10, 2024

Thanks for the feedback :) To be sure to understand correctly, will something like this would be the correct way to initialize the model?

def load_transformers_model_from_dduf(
    cls: "PreTrainedModel", name: str, dduf_entries: Dict[str, DDUFEntry], **kwargs
) -> "PreTrainedModel":
    (...)

    with tempfile.TemporaryDirectory() as tmp_dir:
        tmp_config_file = os.path.join(tmp_dir, "config.json")
        with open(tmp_config_file, "w") as f:
            f.write(config_file.read_text())

        state_dict = {}
        for entry in weight_files:
            with entry.as_mmap() as mm:
                state_dict.update(safetensors.torch.load(mm)) # TODO: move to correct device_map

        return cls.from_pretrained(tmp_dir, state_dict=state_dict, **kwargs)

(also I don't mind changes to my PR if that makes things quicker to get things merged and ready. I can easily be missing some subtleties)

Comment on lines +61 to +62
In practice, `transformers` do not provide a way to load a model from a DDUF archive. This function is a workaround
by instantiating a model from the config file and loading the weights from the DDUF archive directly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we could extract the folder whose entries we're using here and pass the folder name to from_pretrained()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do it but that would mean extracting a much larger amount of data (encoders can easily be 10GB). For tokenizers it's less of a problem to duplicate I/O processing since we are talking about only ~300MB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10171 (comment) could work but I am not sure about its consequences. I will let @ArthurZucker comment here. But regardless, I think this can be merged and it will unblock @SunMarc a bit to make further progress on his PR.

@Wauplin
Copy link
Collaborator Author

Wauplin commented Dec 11, 2024

Merging this PR based on #10171 (comment) (cc @sayakpaul) to unblock the main PR #10037.

How to properly load the encoders is still an on-going question though I pushed 1312b6b to use .from_pretrained.

@Wauplin Wauplin merged commit ed6c727 into dduf Dec 11, 2024
1 of 2 checks passed
@yiyixuxu yiyixuxu deleted the dduf-wauplin-branch branch December 13, 2024 01:04
sayakpaul added a commit that referenced this pull request Jan 14, 2025
* load and save dduf archive

* style

* switch to zip uncompressed

* updates

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Sayak Paul <[email protected]>

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Sayak Paul <[email protected]>

* first draft

* remove print

* switch to dduf_file for consistency

* switch to huggingface hub api

* fix log

* add a basic test

* Update src/diffusers/configuration_utils.py

Co-authored-by: Sayak Paul <[email protected]>

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Sayak Paul <[email protected]>

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Sayak Paul <[email protected]>

* fix

* fix variant

* change saving logic

* DDUF - Load transformers components manually (#10171)

* update hfh version

* Load transformers components manually

* load encoder from_pretrained with state_dict

* working version with transformers and tokenizer !

* add generation_config case

* fix tests

* remove saving for now

* typing

* need next version from transformers

* Update src/diffusers/configuration_utils.py

Co-authored-by: Lucain <[email protected]>

* check path corectly

* Apply suggestions from code review

Co-authored-by: Lucain <[email protected]>

* udapte

* typing

* remove check for subfolder

* quality

* revert setup changes

* oups

* more readable condition

* add loading from the hub test

* add basic docs.

* Apply suggestions from code review

Co-authored-by: Lucain <[email protected]>

* add example

* add

* make functions private

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* minor.

* fixes

* fix

* change the precdence of parameterized.

* error out when custom pipeline is passed with dduf_file.

* updates

* fix

* updates

* fixes

* updates

* fix xfail condition.

* fix xfail

* fixes

* sharded checkpoint compat

* add test for sharded checkpoint

* add suggestions

* Update src/diffusers/models/model_loading_utils.py

Co-authored-by: YiYi Xu <[email protected]>

* from suggestions

* add class attributes to flag dduf tests

* last one

* fix logic

* remove comment

* revert changes

---------

Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants