Skip to content

ADLS Gen2 Storage Support#326

Open
zsy056 wants to merge 1 commit intoargonne-lcf:mainfrom
zsy056:adlsgen2_local
Open

ADLS Gen2 Storage Support#326
zsy056 wants to merge 1 commit intoargonne-lcf:mainfrom
zsy056:adlsgen2_local

Conversation

@zsy056
Copy link
Copy Markdown

@zsy056 zsy056 commented Feb 20, 2026

Summary

This PR introduces ADLS Gen2 backend support for DLIO, including storage access, PyTorch checkpointing, factory wiring, and workload config variants.

What feature is added

  • Adds first-class ADLS Gen2 support so DLIO can run generate/train/checkpoint workflows against ADLS namespaces.
  • Adds ADLS-specific PyTorch checkpoint mechanism integration.

Major Code Changes

  • Added ADLS Gen2 storage implementation:
    • dlio_benchmark/storage/adls_gen2_storage.py
  • Added ADLS PyTorch checkpointing implementation:
    • dlio_benchmark/checkpointing/pytorch_adls_checkpointing.py
  • Updated checkpointing factory and enums to route ADLS checkpoint mechanism:
    • dlio_benchmark/checkpointing/checkpointing_factory.py
    • dlio_benchmark/common/enumerations.py
  • Updated generator/reader factories for ADLS path support:
    • dlio_benchmark/data_generator/generator_factory.py
    • dlio_benchmark/reader/reader_factory.py
  • Added ADLS workload config variants:
    • dlio_benchmark/configs/workload/unet3d_a100_adlsgen2.yaml
    • dlio_benchmark/configs/workload/unet3d_h100_adlsgen2.yaml
    • dlio_benchmark/configs/workload/unet3d_v100_adlsgen2.yaml
  • Updated CI workflow coverage:
    • .github/workflows/ci.yml

Testing / Validation

  • CI workflow includes ADLS-related coverage updates.

@zhenghh04
Copy link
Copy Markdown
Member

@zsy056 Please reconcile the conflicts. We had a refactoring for the generator part.

@zsy056 zsy056 force-pushed the adlsgen2_local branch 2 times, most recently from 650b209 to 6d39aa2 Compare March 18, 2026 21:19
Copy link
Copy Markdown
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments

Comment on lines +65 to +71
self._account_name = None
self._account_key = None
self._shared_access_signature = None
self._blobio_credential = None
self._container_sas_tokens = {}
self._container_sas_ttl = timedelta(hours=1)
self._container_sas_refresh_margin = timedelta(minutes=5)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be configured by users? If so please move them to yaml. I think ttl and refresh margin atleast should be moved.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, moved ttl and refresh margin to config, removed blobio_credential as it is not really needed.

Comment on lines +29 to +30
DEFAULT_CONTAINER_SAS_TTL = timedelta(hours=1)
DEFAULT_CONTAINER_SAS_REFRESH_MARGIN = timedelta(minutes=5)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this should be in constants and then used within configuration not here.

if token and expires_at and (expires_at - now) > refresh_margin:
return token

ttl = getattr(self, "_container_sas_ttl", DEFAULT_CONTAINER_SAS_TTL)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again this should be set in config.py in the ConfigurationManager class. I don't see it changed.

def _get_container_sas(self, container_name):
cache_entry = self._container_sas_tokens.get(container_name)
now = datetime.now(timezone.utc)
refresh_margin = getattr(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again this should be set in config.py in the ConfigurationManager class. I don't see it changed.

Comment on lines +88 to +90
connection_string = storage_options.get("connection_string")
account_url = storage_options.get("account_url")
account_name = storage_options.get("account_name")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again this should be set in config.py in the ConfigurationManager class. Then used in storage class as well as here.

Comment on lines +76 to +85
self._container_sas_ttl = self._get_duration_option(
storage_options,
"container_sas_ttl",
DEFAULT_CONTAINER_SAS_TTL,
)
self._container_sas_refresh_margin = self._get_duration_option(
storage_options,
"sas_refresh_margin",
DEFAULT_CONTAINER_SAS_REFRESH_MARGIN,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again this should be set in config.py in the ConfigurationManager class. I don't see it changed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I have explicitly surfaced them in config.py.

Add ADLS Gen2 storage backend support

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Change ADLS Gen2 namespace type to HIERARCHICAL

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Implement ADLS Gen2 API calls using Azure SDK

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Fix code review issues: remove unused import and fix bare except clauses

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Add ADLS Gen2 PyTorch checkpointing and fixes

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Remove unused ctypes import from pytorch_adls_checkpointing

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Add ADLS Gen2 support to CI workflow

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Add workflow_dispatch trigger to ci.yml

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Fix Microsoft repository 403 errors in CI

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Implement ADLS Gen2 test suite

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Fix ADLS Gen2 generator support for NPY/NPZ formats

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Move DataLakeServiceClient import to module level for test patching

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Fix ADLS test configuration and add DefaultAzureCredential patching

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Remove redundant comment from test for consistency

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Fix ADLS Gen2 URI parsing in all storage methods

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Refactor URI parsing to use helper method and reduce duplication

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Add debug logging to investigate test failure

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Add debug logging to create_node as well

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Add comprehensive debug logging to ADLS Gen2 storage

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Fix get_uri to avoid double URI prefixing

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Fix mock get_paths to return full paths matching Azure SDK behavior

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Move AzStorageCheckpoint import to module level for test patching

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Fix test_adls_subset to match S3 pattern - remove incorrect checkpoint mode

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Implement proper AzStorageCheckpoint mock with writer/reader context managers

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Remove redundant AzStorageCheckpoint patches from checkpoint tests

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

Fix ADLS checkpoint test: always apply MockAzStorageCheckpoint patch

Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com>

fix adlsgen2 tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants