DDUF - Load transformers components manually #10171

Wauplin · 2024-12-10T11:41:58Z

This PR adds a workaround to load transformers components without passing DDUF entries to transformers.

As mentioned with @LysandreJik on slack (private), modifying transformers internals to load from a DDUF archive (huggingface/transformers#35093) is going to be hard to scale. Also it makes transformers internals heavily depends on diffusers's DDUF format which we don't even know yet if it will be adopted by the community ( 🤞 ). On slack we briefly discussed a way to update transformers in such a way that loading from a folder or a zip directory works the same but even that is a hazardous non-trivial change. After further discussions, we ended up suggesting a solution balancing between loading speed and maintainability:

for tokenizers components, we will unzip the archive to a temporary directory and load from there => it's not the most optimized way I/O-wise but given tokenizers are small-ish in general, it shouldn't impact users so badly
for transformers models, we will 1. load the config 2. instantiate an empty model from the config 3. load the weights manually from diffusers

This way we keep the heavy load in diffusers and are therefore able to do it in an optimized way (i.e. directly from the DDUF file).

This PR introduces the required changes for this to work. I did not handle all the low-level kwargs (torch_dtype, device_map, max_memory, offload_folder, offload_state_dict, use_safetensors, low_cpu_mem_usage) since it's not my area of expertise but I believe it shouldn't cause problem.

I also took the opportunity to set huggingface_hub 0.27.x as the lowest required dependency to benefit from the dduf helpers directly. Release is not out yet but will be before we merge the "big dduf PR" in diffusers.

What I suggest now is:

merge this PR "as is" in the DDUF branch
work from their to handle the loading kwargs
close [FEAT] Compatibility with dduf format from diffusers transformers#35093 without merging

Wauplin · 2024-12-10T12:58:44Z

src/diffusers/pipelines/transformers_loading_utils.py

+    # TODO: use kwargs (torch_dtype, device_map, max_memory, offload_folder, offload_state_dict, use_safetensors, low_cpu_mem_usage)
+    # TODO: is there something else we should take care of?
+    for entry in weight_files:
+        with entry.as_mmap() as mm:
+            state_dict = safetensors.torch.load(mm)
+            model.load_state_dict(state_dict)


mentioned by @LysandreJik better to use model.from_pretrained(state_dict)

Indeed, this won't work with quantization otherwise as we perform some changes to the model.

100 percent should be replaced with from_pretrained().

Suggested a solution here: #10171 (comment). Waiting for confirmation before moving forward

ArthurZucker

Noice!

ArthurZucker · 2024-12-10T13:37:46Z

src/diffusers/pipelines/transformers_loading_utils.py

+        )
+    config = PretrainedConfig(**json.loads(config_file.read_text()))
+
+    model = cls(config)


Few things here:

this will go through the slow initialization of all modules

we should init the model on meta, then assign the weights, with load_state_dict(..., assing=True)

I don't think we should first initialize the model under meta at all as we will be leveraging from_pretrained() anyway.

See: https://github.com/huggingface/diffusers/pull/10171/files#r1878051478

Wauplin · 2024-12-10T15:40:41Z

Thanks for the feedback :) To be sure to understand correctly, will something like this would be the correct way to initialize the model?

def load_transformers_model_from_dduf(
    cls: "PreTrainedModel", name: str, dduf_entries: Dict[str, DDUFEntry], **kwargs
) -> "PreTrainedModel":
    (...)

    with tempfile.TemporaryDirectory() as tmp_dir:
        tmp_config_file = os.path.join(tmp_dir, "config.json")
        with open(tmp_config_file, "w") as f:
            f.write(config_file.read_text())

        state_dict = {}
        for entry in weight_files:
            with entry.as_mmap() as mm:
                state_dict.update(safetensors.torch.load(mm)) # TODO: move to correct device_map

        return cls.from_pretrained(tmp_dir, state_dict=state_dict, **kwargs)

(also I don't mind changes to my PR if that makes things quicker to get things merged and ready. I can easily be missing some subtleties)

src/diffusers/pipelines/pipeline_utils.py

sayakpaul · 2024-12-11T05:23:45Z

src/diffusers/pipelines/transformers_loading_utils.py

+    In practice, `transformers` do not provide a way to load a model from a DDUF archive. This function is a workaround
+    by instantiating a model from the config file and loading the weights from the DDUF archive directly.


What if we could extract the folder whose entries we're using here and pass the folder name to from_pretrained()?

We could do it but that would mean extracting a much larger amount of data (encoders can easily be 10GB). For tokenizers it's less of a problem to duplicate I/O processing since we are talking about only ~300MB.

#10171 (comment) could work but I am not sure about its consequences. I will let @ArthurZucker comment here. But regardless, I think this can be merged and it will unblock @SunMarc a bit to make further progress on his PR.

Wauplin · 2024-12-11T10:50:42Z

Merging this PR based on #10171 (comment) (cc @sayakpaul) to unblock the main PR #10037.

How to properly load the encoders is still an on-going question though I pushed 1312b6b to use .from_pretrained.

* load and save dduf archive * style * switch to zip uncompressed * updates * Update src/diffusers/pipelines/pipeline_utils.py Co-authored-by: Sayak Paul <[email protected]> * Update src/diffusers/pipelines/pipeline_utils.py Co-authored-by: Sayak Paul <[email protected]> * first draft * remove print * switch to dduf_file for consistency * switch to huggingface hub api * fix log * add a basic test * Update src/diffusers/configuration_utils.py Co-authored-by: Sayak Paul <[email protected]> * Update src/diffusers/pipelines/pipeline_utils.py Co-authored-by: Sayak Paul <[email protected]> * Update src/diffusers/pipelines/pipeline_utils.py Co-authored-by: Sayak Paul <[email protected]> * fix * fix variant * change saving logic * DDUF - Load transformers components manually (#10171) * update hfh version * Load transformers components manually * load encoder from_pretrained with state_dict * working version with transformers and tokenizer ! * add generation_config case * fix tests * remove saving for now * typing * need next version from transformers * Update src/diffusers/configuration_utils.py Co-authored-by: Lucain <[email protected]> * check path corectly * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> * udapte * typing * remove check for subfolder * quality * revert setup changes * oups * more readable condition * add loading from the hub test * add basic docs. * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> * add example * add * make functions private * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * minor. * fixes * fix * change the precdence of parameterized. * error out when custom pipeline is passed with dduf_file. * updates * fix * updates * fixes * updates * fix xfail condition. * fix xfail * fixes * sharded checkpoint compat * add test for sharded checkpoint * add suggestions * Update src/diffusers/models/model_loading_utils.py Co-authored-by: YiYi Xu <[email protected]> * from suggestions * add class attributes to flag dduf tests * last one * fix logic * remove comment * revert changes --------- Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Lucain <[email protected]> Co-authored-by: Steven Liu <[email protected]> Co-authored-by: YiYi Xu <[email protected]>

Wauplin added 2 commits December 10, 2024 10:29

update hfh version

5deac06

Load transformers components manually

330721b

Wauplin requested review from LysandreJik, SunMarc and sayakpaul December 10, 2024 11:41

Wauplin mentioned this pull request Dec 10, 2024

[FEAT] DDUF format #10037

Merged

4 tasks

Wauplin commented Dec 10, 2024

View reviewed changes

ArthurZucker reviewed Dec 10, 2024

View reviewed changes

sayakpaul reviewed Dec 11, 2024

View reviewed changes

src/diffusers/pipelines/pipeline_utils.py Show resolved Hide resolved

sayakpaul reviewed Dec 11, 2024

View reviewed changes

load encoder from_pretrained with state_dict

1312b6b

Wauplin merged commit ed6c727 into dduf Dec 11, 2024
1 of 2 checks passed

yiyixuxu deleted the dduf-wauplin-branch branch December 13, 2024 01:04

		In practice, `transformers` do not provide a way to load a model from a DDUF archive. This function is a workaround
		by instantiating a model from the config file and loading the weights from the DDUF archive directly.

DDUF - Load transformers components manually #10171

DDUF - Load transformers components manually #10171

Uh oh!

Conversation

Wauplin commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wauplin commented Dec 10, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wauplin commented Dec 11, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Wauplin commented Dec 10, 2024 •

edited

Loading