Skip to content

Conversation

@willmj
Copy link
Collaborator

@willmj willmj commented Dec 10, 2024

Try the following in a cluster:

python -m fms_acceleration_moe.utils.checkpoint_utils \
    <checkpoint file> \
    <output file> \
    <original model>

with a safetensors model

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
@willmj willmj requested a review from fabianlim as a code owner December 10, 2024 00:50
@fabianlim
Copy link
Contributor

fabianlim commented Dec 10, 2024

@willmj maybe we can replace the wth the sharded version since it will have more utility. This will supercede the current function, so just directly replace

def save_sharded_safetensors(
    state_dict: Dict, 
    save_directory: str, 
    metadata: Dict,
    max_shard_size: Union[int, str] = "5GB",
):
    filename_pattern = SAFE_WEIGHTS_NAME.replace(".bin", "{suffix}.bin").replace(".safetensors", "{suffix}.safetensors")
    state_dict_split = split_torch_state_dict_into_shards(
        state_dict, filename_pattern=filename_pattern, max_shard_size=max_shard_size
    )
    index = {
        "metadata": state_dict_split.metadata,
        "weight_map": state_dict_split.tensor_to_filename,
    }
    # Save the index 
    with open(
        os.path.join(save_directory, SAFE_WEIGHTS_INDEX_NAME), 
        "w", encoding="utf-8"
    ) as f:
        content = json.dumps(index, indent=2, sort_keys=True) + "\n"
        f.write(content)

    filename_to_tensors = state_dict_split.filename_to_tensors.items()
    for shard_file, tensors in filename_to_tensors:
        shard = {tensor: state_dict[tensor].contiguous() for tensor in tensors}
        save_file(shard, os.path.join(save_directory, shard_file), metadata=metadata)

@willmj
Copy link
Collaborator Author

willmj commented Dec 10, 2024

Don't have write access to this branch, made PR #115 with the changes. Thanks Fabian!

@willmj
Copy link
Collaborator Author

willmj commented Dec 10, 2024

#116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants