This repository was archived by the owner on Sep 4, 2025. It is now read-only.
forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 15
Fastsafetensors #298
Open
fialhocoelho
wants to merge
3
commits into
main
Choose a base branch
from
fastsafetensors
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+74
−5
Open
Fastsafetensors #298
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Loading Model weights with fastsafetensors | ||
=================================================================== | ||
|
||
Using fastsafetensor library enables loading model weights to GPU memory by leveraging GPU direct storage. See https://github.com/foundation-model-stack/fastsafetensors for more details. | ||
For enabling this feature, set the environment variable ``USE_FASTSAFETENSOR`` to ``true`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this still relevant? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,9 +42,10 @@ | |
set_default_torch_dtype) | ||
from vllm.model_executor.model_loader.weight_utils import ( | ||
download_safetensors_index_file_from_hf, download_weights_from_hf, | ||
filter_duplicate_safetensors_files, filter_files_not_needed_for_inference, | ||
get_gguf_extra_tensor_names, gguf_quant_weights_iterator, | ||
initialize_dummy_weights, np_cache_weights_iterator, pt_weights_iterator, | ||
fastsafetensors_weights_iterator, filter_duplicate_safetensors_files, | ||
filter_files_not_needed_for_inference, get_gguf_extra_tensor_names, | ||
gguf_quant_weights_iterator, initialize_dummy_weights, | ||
np_cache_weights_iterator, pt_weights_iterator, | ||
runai_safetensors_weights_iterator, safetensors_weights_iterator) | ||
from vllm.model_executor.utils import set_weight_attrs | ||
from vllm.platforms import current_platform | ||
|
@@ -307,7 +308,15 @@ def _get_weights_iterator( | |
hf_weights_files, | ||
) | ||
elif use_safetensors: | ||
weights_iterator = safetensors_weights_iterator(hf_weights_files) | ||
use_fastsafe_tensor = os.getenv('USE_FASTSAFETENSOR', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My suggestion here would be to enable loader extra config for the DefaultLoader: https://github.com/vllm-project/vllm/blob/e784c6b9984e8f8116f74000b863d941495acb0b/vllm/model_executor/model_loader/loader.py#L190 So that you can enable fastsafetensors and GDS with |
||
'False').lower() == 'true' | ||
if use_fastsafe_tensor: | ||
logger.info("Using fastsafetensor for loading weights") | ||
weights_iterator = fastsafetensors_weights_iterator( | ||
hf_weights_files) | ||
else: | ||
weights_iterator = safetensors_weights_iterator( | ||
hf_weights_files) | ||
else: | ||
weights_iterator = pt_weights_iterator(hf_weights_files) | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the fastsafetensors repo's pyproject.toml,
nvidia-cufile-cu12
is not listed as a dependency. Should we open a PR there to add this as a dep instead of manually adding it here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
cufile
lib isn’t listed as a direct installation requirement, but it’s necessary because when importing the main object, it relies on a piece of code (fastcpp
) that depends oncufile
. The key point is thatcufile
is required for using the library with GDS support, which isn’t our use case. However, due to how the lib is structured, we still need this dependency even if we’re not using GDS. According to the docs, settingnogds=True
disables GDS support, but thecufile
requirement is triggered immediately upon import. I believe the best course of action is to open an issue in the repo to address this condition (I can take care of that).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @fialhocoelho, I missed this reply.
Thanks for the explanation. Are we sure we're not interested in GDS though? I don't know much about it, but it seems it's one of the performance improvements that
fastsafetensors
bringsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtrifiro I don’t have much experience with how GDS would be installed in an OpenShift env, since it operates at a low level. I’ve only used it in a DGX environment. I can look into it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html#software-architecture it doesn't seem like there's much needed apart from
cufile
and the nvidia driver