-
Notifications
You must be signed in to change notification settings - Fork 58
Add Native AIStore Storage Support #320
Description
Add native support for NVIDIA AIStore as a storage backend in DLIO benchmark. AIStore is a lightweight, high-performance distributed object storage system designed specifically for AI/ML workloads.
Background
NVIDIA AIStore (AIS) is a scalable storage stack tailored for AI applications with features including:
- Linear scalability and petascale performance
- S3-compatible API with native optimizations
- ETL offload capabilities
- Multi-cloud support
- Kubernetes-native deployment
AIStore website: https://aistore.nvidia.com
GitHub: https://github.com/NVIDIA/aistore
Motivation
We want to participate in the MLCommons Storage benchmark and need native AIStore support in DLIO for optimal performance testing.
Why Not Use S3 Compatibility?
While AIStore provides S3 compatibility, there are key challenges:
-
Redirect Handling: AIStore uses HTTP 307 redirects for load balancing (proxy → target nodes). The current
s3torchconnectordoes not natively support this redirect mechanism, causing failures. -
Additional Complexity: Using S3 compatibility requires:
- SSL certificate management
- Authentication token handling
- Custom timeout configurations
botocorepatching for redirect support
-
Performance: Native AIStore SDK provides:
- Direct API access (no S3 translation overhead)
- Better error handling and diagnostics
- Access to AIStore-specific features (batch operations, ETL, etc.)
Proposed Solution
Add a new StorageType.AISTORE that:
- Inherits from
DataStorage- Independent implementation using AIStore Python SDK - Reuses S3 Generators/Readers - Leverages existing
NPYGeneratorS3,NPYReaderS3, etc. - Uses Native AIStore SDK - Direct API calls via
aistorePython package
Architecture
Config: storage_type: aistore, data_folder: s3://bucket
↓
StorageFactory → AIStoreStorage (extends DataStorage)
↓
GeneratorFactory → NPYGeneratorS3 (checks for AISTORE)
↓
ReaderFactory → NPYReaderS3 (checks for AISTORE)
↓
AIStoreStorage methods → aistore.sdk.Client API
Implementation
Files Modified
-
dlio_benchmark/common/enumerations.py- Add
AISTORE = 'aistore'toStorageTypeenum
- Add
-
dlio_benchmark/storage/aistore_storage.py(new file)- Implement
AIStoreStorageclass - Methods:
put_data(),get_data(),walk_node(),create_namespace(), etc. - Uses
aistore.sdk.Clientfor all I/O operations
- Implement
-
dlio_benchmark/storage/storage_factory.py- Add case for
StorageType.AISTORE - Guarded import for optional AIStore dependency
- Add case for
-
dlio_benchmark/data_generator/generator_factory.py- Update NPY/NPZ generator selection to include
AISTORE - Reuses existing S3 generators
- Update NPY/NPZ generator selection to include
-
dlio_benchmark/reader/reader_factory.py- Update NPY/NPZ reader selection to include
AISTORE - Reuses existing S3 readers
- Update NPY/NPZ reader selection to include
-
dlio_benchmark/utils/config.py- Add validation for
StorageType.AISTORE - Check for
aistorepackage availability
- Add validation for
Dependencies
- Optional dependency:
aistorePython SDK - Install via:
pip install aistore - Guarded imports ensure DLIO works without AIStore installed
Example Usage
# workload config
storage:
storage_type: aistore
storage_root: dlio-benchmark
storage_options:
endpoint_url: http://localhost:8080
dataset:
data_folder: s3://dlio-benchmark # S3 URI format for generators
format: npy# Run benchmark
dlio_benchmark workload=aistore_native_local \
++workload.storage.storage_options.endpoint_url=http://aistore-endpoint:8080References
- AIStore GitHub: https://github.com/NVIDIA/aistore
- AIStore Documentation: https://aistore.nvidia.com/docs
- AIStore Python SDK: https://github.com/NVIDIA/aistore/tree/main/python
- MLCommons Storage Benchmark: https://mlcommons.org/benchmarks/storage/
Pull Request
Implementation is ready and will be submitted as a PR.