You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: handle safetensors conversion for unshared incomplete tensors (#49)
#### Motivation
The automatic safetensors conversion for Mistral should work, but it
currently fails. Running
```
text-generation-server download-weights mistralai/Mistral-7B-v0.1 --revision=clarify-transformers-requirement
```
fails with
```
Traceback (most recent call last):
File "/opt/tgis/bin/text-generation-server", line 8, in <module>
sys.exit(app())
^^^^^
File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/cli.py", line 104, in download_weights
convert_to_safetensors(model_name, revision)
File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/cli.py", line 192, in convert_to_safetensors
utils.convert_files(local_pt_files, local_st_files, discard_names)
File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/utils/convert.py", line 123, in convert_files
convert_file(pt_file, sf_file, discard_names)
File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/utils/convert.py", line 69, in convert_file
to_removes = _remove_duplicate_names(loaded, discard_names=discard_names)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/tgis/lib/python3.11/site-packages/text_generation_server/utils/convert.py", line 33, in _remove_duplicate_names
raise RuntimeError(
RuntimeError: Error while trying to find names to remove to save state dict, but found no suitable name to keep for saving amongst: {'model.norm.weight'}. None is covering the entire storage.Refusing to save/load the model since you could be storing much more memory than needed. Please refer to https://huggingface.co/docs/safetensors/torch_shared_tensors for more information. Or open an issue.
```
`torch.save` saves the underlying storage tensor but the safetensor
format doesn't support shared tensors. However, the problematic
`model.norm.weight` layer is not actually shared (no overlap with other
tensors in the state_dict), but it is incomplete (does not fully cover
the span of its underlying storage tensor).
#### Modifications
`_find_shared_tensors` groups tensor names with overlapping data and
returns them as a list of sets. These sets will have a single element
when the tensor data is not shared, so this PR just skips sets of size 1
when removing duplicates.
#### Result
We now support converting models that have unshared but incomplete
tensors.
#### Related Issues
---------
Signed-off-by: Travis Johnson <[email protected]>
0 commit comments