Skip to content

Support loading from model.safetensors.index.json#6

Merged
chengchingwen merged 7 commits intoFluxML:mainfrom
findmyway:main
Mar 21, 2025
Merged

Support loading from model.safetensors.index.json#6
chengchingwen merged 7 commits intoFluxML:mainfrom
findmyway:main

Conversation

@findmyway
Copy link
Contributor

No description provided.

Copy link
Member

@pxl-th pxl-th left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Would be good to add a tests if possible.

@ToucheSir
Copy link
Member

I noticed that this index file is not covered anywhere in https://huggingface.co/docs/safetensors/index or the safetensors repo. Is it a huggingface-specific thing? Is there a documentation link we can point users to for it?

@findmyway
Copy link
Contributor Author

I noticed that this index file is not covered anywhere in https://huggingface.co/docs/safetensors/index or the safetensors repo. Is it a huggingface-specific thing? Is there a documentation link we can point users to for it?

See https://huggingface.co/docs/huggingface_hub/package_reference/hf_api#huggingface_hub.utils.SafetensorsRepoMetadata

@findmyway findmyway marked this pull request as draft March 20, 2025 03:44
@findmyway findmyway marked this pull request as ready for review March 20, 2025 05:29
@findmyway
Copy link
Contributor Author

LGTM! Would be good to add a tests if possible.

Added

Copy link
Member

@chengchingwen chengchingwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index.json is defined by huggingface in their python binding for sharding the model weights and is not part of the safetensors format/spec. This should not be in this package (or at least, it should be named as something like load_shard_safetensors instead of modifying load_safetensors.

@findmyway
Copy link
Contributor Author

Hi @chengchingwen ,

I agree this feature is not part of the spec. The reason is that in real world cases, different packages may have different approaches to handle the shards (like loading them distributedly or GC during loading). However, I think the modifications I made here provide a nice-to-have fallback (by loading them all in memory).

Regarding the naming issue, reusing the load_safetensors won't create any breaking change. But I'd be happy to change if you insist.

@chengchingwen
Copy link
Member

The reason is that in real world cases, different packages may have different approaches to handle the shards

It's also part of the reason that I think it should not be in this package, but I agree it would be convenient to have and is a reasonable default for loading sharded weights. Personally, the index.json is kind of unsafe regarding the original intention of safetensors. Meanwhile, it doesn't have a stable format/spec which make things break easily and inconspicuously if any changes have been made to their python code. Adding new function also won't be a breaking change and could provide a clear separation between spec and custom behavior, so I would insist the name change if we are going to merge it.

@chengchingwen
Copy link
Member

Also, some unneeded files should be removed. e.g. test/README.md

@findmyway findmyway requested a review from chengchingwen March 21, 2025 09:22
@findmyway findmyway requested a review from chengchingwen March 21, 2025 13:39
@chengchingwen chengchingwen merged commit a720c55 into FluxML:main Mar 21, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants