feat: introduce Multi-Storage Client (MSC) as a storage provider#337
Open
shunjiad wants to merge 1 commit intoargonne-lcf:mainfrom
Open
feat: introduce Multi-Storage Client (MSC) as a storage provider#337shunjiad wants to merge 1 commit intoargonne-lcf:mainfrom
shunjiad wants to merge 1 commit intoargonne-lcf:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds NVIDIA Multi-Storage Client (MSC) as an optional storage provider so DLIO can run data generation, training I/O, and PyTorch checkpointing against MSC-resolved backends (for example object storage such as S3, GCS, Azure, etc), consistent with existing
local_fs,parallel_fs,s3, andaistoremodes.MSC is integrated for the indexed binary dataset format and PyTorch checkpointing.
Configuration examples
1. MSC client configuration
MSC is configured in its own YAML file; see the Configuration Reference. A minimal S3 profile looks like this:
Point MSC at your config file with the
MSC_CONFIGenvironment variable (for example in the shell before running DLIO):export MSC_CONFIG=/path/to/msc_config.yaml2. DLIO workload profile (Hydra YAML)
In the workload, set
storage_typetomscand setstorage_rootto an MSC URI. The segment immediately aftermsc://is the profile name (a key underprofiles:in your MSC config, such asexperimentsin the S3 example above); the following path is the prefix for objects under that profile. DLIO passesstorage_roottomultistorageclient.resolve_storage_client(), so dataset and checkpoint paths combine with this prefix and are served through MSC.For this snippet,
experimentsmust be a profile defined in your MSC configuration file;megatron-deepspeed/is the artifact prefix under that profile.