Skip to content

Add Native AIStore Storage Support #320

@gaikwadabhishek

Description

@gaikwadabhishek

Add native support for NVIDIA AIStore as a storage backend in DLIO benchmark. AIStore is a lightweight, high-performance distributed object storage system designed specifically for AI/ML workloads.

Background

NVIDIA AIStore (AIS) is a scalable storage stack tailored for AI applications with features including:

  • Linear scalability and petascale performance
  • S3-compatible API with native optimizations
  • ETL offload capabilities
  • Multi-cloud support
  • Kubernetes-native deployment

AIStore website: https://aistore.nvidia.com
GitHub: https://github.com/NVIDIA/aistore

Motivation

We want to participate in the MLCommons Storage benchmark and need native AIStore support in DLIO for optimal performance testing.

Why Not Use S3 Compatibility?

While AIStore provides S3 compatibility, there are key challenges:

  1. Redirect Handling: AIStore uses HTTP 307 redirects for load balancing (proxy → target nodes). The current s3torchconnector does not natively support this redirect mechanism, causing failures.

  2. Additional Complexity: Using S3 compatibility requires:

    • SSL certificate management
    • Authentication token handling
    • Custom timeout configurations
    • botocore patching for redirect support
  3. Performance: Native AIStore SDK provides:

    • Direct API access (no S3 translation overhead)
    • Better error handling and diagnostics
    • Access to AIStore-specific features (batch operations, ETL, etc.)

Proposed Solution

Add a new StorageType.AISTORE that:

  1. Inherits from DataStorage - Independent implementation using AIStore Python SDK
  2. Reuses S3 Generators/Readers - Leverages existing NPYGeneratorS3, NPYReaderS3, etc.
  3. Uses Native AIStore SDK - Direct API calls via aistore Python package

Architecture

Config: storage_type: aistore, data_folder: s3://bucket
              ↓
StorageFactory → AIStoreStorage (extends DataStorage)
              ↓
GeneratorFactory → NPYGeneratorS3 (checks for AISTORE)
              ↓
ReaderFactory → NPYReaderS3 (checks for AISTORE)
              ↓
AIStoreStorage methods → aistore.sdk.Client API

Implementation

Files Modified

  1. dlio_benchmark/common/enumerations.py

    • Add AISTORE = 'aistore' to StorageType enum
  2. dlio_benchmark/storage/aistore_storage.py (new file)

    • Implement AIStoreStorage class
    • Methods: put_data(), get_data(), walk_node(), create_namespace(), etc.
    • Uses aistore.sdk.Client for all I/O operations
  3. dlio_benchmark/storage/storage_factory.py

    • Add case for StorageType.AISTORE
    • Guarded import for optional AIStore dependency
  4. dlio_benchmark/data_generator/generator_factory.py

    • Update NPY/NPZ generator selection to include AISTORE
    • Reuses existing S3 generators
  5. dlio_benchmark/reader/reader_factory.py

    • Update NPY/NPZ reader selection to include AISTORE
    • Reuses existing S3 readers
  6. dlio_benchmark/utils/config.py

    • Add validation for StorageType.AISTORE
    • Check for aistore package availability

Dependencies

  • Optional dependency: aistore Python SDK
  • Install via: pip install aistore
  • Guarded imports ensure DLIO works without AIStore installed

Example Usage

# workload config
storage:
  storage_type: aistore
  storage_root: dlio-benchmark
  storage_options:
    endpoint_url: http://localhost:8080

dataset:
  data_folder: s3://dlio-benchmark  # S3 URI format for generators
  format: npy
# Run benchmark
dlio_benchmark workload=aistore_native_local \
  ++workload.storage.storage_options.endpoint_url=http://aistore-endpoint:8080

References

Pull Request

Implementation is ready and will be submitted as a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions