-
Notifications
You must be signed in to change notification settings - Fork 6.4k
DDUF - Load transformers components manually #10171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # TODO: use kwargs (torch_dtype, device_map, max_memory, offload_folder, offload_state_dict, use_safetensors, low_cpu_mem_usage) | ||
| # TODO: is there something else we should take care of? | ||
| for entry in weight_files: | ||
| with entry.as_mmap() as mm: | ||
| state_dict = safetensors.torch.load(mm) | ||
| model.load_state_dict(state_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mentioned by @LysandreJik better to use model.from_pretrained(state_dict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this won't work with quantization otherwise as we perform some changes to the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100 percent should be replaced with from_pretrained().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested a solution here: #10171 (comment). Waiting for confirmation before moving forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noice!
| ) | ||
| config = PretrainedConfig(**json.loads(config_file.read_text())) | ||
|
|
||
| model = cls(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few things here:
- this will go through the slow initialization of all modules
- we should init the model on meta, then assign the weights, with
load_state_dict(..., assing=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should first initialize the model under meta at all as we will be leveraging from_pretrained() anyway.
See: https://github.com/huggingface/diffusers/pull/10171/files#r1878051478
|
Thanks for the feedback :) To be sure to understand correctly, will something like this would be the correct way to initialize the model? def load_transformers_model_from_dduf(
cls: "PreTrainedModel", name: str, dduf_entries: Dict[str, DDUFEntry], **kwargs
) -> "PreTrainedModel":
(...)
with tempfile.TemporaryDirectory() as tmp_dir:
tmp_config_file = os.path.join(tmp_dir, "config.json")
with open(tmp_config_file, "w") as f:
f.write(config_file.read_text())
state_dict = {}
for entry in weight_files:
with entry.as_mmap() as mm:
state_dict.update(safetensors.torch.load(mm)) # TODO: move to correct device_map
return cls.from_pretrained(tmp_dir, state_dict=state_dict, **kwargs)(also I don't mind changes to my PR if that makes things quicker to get things merged and ready. I can easily be missing some subtleties) |
| In practice, `transformers` do not provide a way to load a model from a DDUF archive. This function is a workaround | ||
| by instantiating a model from the config file and loading the weights from the DDUF archive directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we could extract the folder whose entries we're using here and pass the folder name to from_pretrained()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do it but that would mean extracting a much larger amount of data (encoders can easily be 10GB). For tokenizers it's less of a problem to duplicate I/O processing since we are talking about only ~300MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#10171 (comment) could work but I am not sure about its consequences. I will let @ArthurZucker comment here. But regardless, I think this can be merged and it will unblock @SunMarc a bit to make further progress on his PR.
|
Merging this PR based on #10171 (comment) (cc @sayakpaul) to unblock the main PR #10037. How to properly load the encoders is still an on-going question though I pushed 1312b6b to use |
* load and save dduf archive * style * switch to zip uncompressed * updates * Update src/diffusers/pipelines/pipeline_utils.py Co-authored-by: Sayak Paul <[email protected]> * Update src/diffusers/pipelines/pipeline_utils.py Co-authored-by: Sayak Paul <[email protected]> * first draft * remove print * switch to dduf_file for consistency * switch to huggingface hub api * fix log * add a basic test * Update src/diffusers/configuration_utils.py Co-authored-by: Sayak Paul <[email protected]> * Update src/diffusers/pipelines/pipeline_utils.py Co-authored-by: Sayak Paul <[email protected]> * Update src/diffusers/pipelines/pipeline_utils.py Co-authored-by: Sayak Paul <[email protected]> * fix * fix variant * change saving logic * DDUF - Load transformers components manually (#10171) * update hfh version * Load transformers components manually * load encoder from_pretrained with state_dict * working version with transformers and tokenizer ! * add generation_config case * fix tests * remove saving for now * typing * need next version from transformers * Update src/diffusers/configuration_utils.py Co-authored-by: Lucain <[email protected]> * check path corectly * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> * udapte * typing * remove check for subfolder * quality * revert setup changes * oups * more readable condition * add loading from the hub test * add basic docs. * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> * add example * add * make functions private * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * minor. * fixes * fix * change the precdence of parameterized. * error out when custom pipeline is passed with dduf_file. * updates * fix * updates * fixes * updates * fix xfail condition. * fix xfail * fixes * sharded checkpoint compat * add test for sharded checkpoint * add suggestions * Update src/diffusers/models/model_loading_utils.py Co-authored-by: YiYi Xu <[email protected]> * from suggestions * add class attributes to flag dduf tests * last one * fix logic * remove comment * revert changes --------- Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Lucain <[email protected]> Co-authored-by: Steven Liu <[email protected]> Co-authored-by: YiYi Xu <[email protected]>
This PR adds a workaround to load
transformerscomponents without passing DDUF entries totransformers.As mentioned with @LysandreJik on slack (private), modifying
transformersinternals to load from a DDUF archive (huggingface/transformers#35093) is going to be hard to scale. Also it makestransformersinternals heavily depends ondiffusers's DDUF format which we don't even know yet if it will be adopted by the community ( 🤞 ). On slack we briefly discussed a way to updatetransformersin such a way that loading from a folder or a zip directory works the same but even that is a hazardous non-trivial change. After further discussions, we ended up suggesting a solution balancing between loading speed and maintainability:tokenizerscomponents, we will unzip the archive to a temporary directory and load from there => it's not the most optimized way I/O-wise but given tokenizers are small-ish in general, it shouldn't impact users so badlytransformersmodels, we will 1. load the config 2. instantiate an empty model from the config 3. load the weights manually fromdiffusersThis way we keep the heavy load in
diffusersand are therefore able to do it in an optimized way (i.e. directly from the DDUF file).This PR introduces the required changes for this to work. I did not handle all the low-level kwargs (torch_dtype, device_map, max_memory, offload_folder, offload_state_dict, use_safetensors, low_cpu_mem_usage) since it's not my area of expertise but I believe it shouldn't cause problem.
I also took the opportunity to set
huggingface_hub 0.27.xas the lowest required dependency to benefit from the dduf helpers directly. Release is not out yet but will be before we merge the "big dduf PR" in diffusers.What I suggest now is: