Conversation
|
@zsy056 Please reconcile the conflicts. We had a refactoring for the generator part. |
650b209 to
6d39aa2
Compare
hariharan-devarajan
left a comment
There was a problem hiding this comment.
Some minor comments
| self._account_name = None | ||
| self._account_key = None | ||
| self._shared_access_signature = None | ||
| self._blobio_credential = None | ||
| self._container_sas_tokens = {} | ||
| self._container_sas_ttl = timedelta(hours=1) | ||
| self._container_sas_refresh_margin = timedelta(minutes=5) |
There was a problem hiding this comment.
should be configured by users? If so please move them to yaml. I think ttl and refresh margin atleast should be moved.
There was a problem hiding this comment.
thanks, moved ttl and refresh margin to config, removed blobio_credential as it is not really needed.
| DEFAULT_CONTAINER_SAS_TTL = timedelta(hours=1) | ||
| DEFAULT_CONTAINER_SAS_REFRESH_MARGIN = timedelta(minutes=5) |
There was a problem hiding this comment.
Again this should be in constants and then used within configuration not here.
| if token and expires_at and (expires_at - now) > refresh_margin: | ||
| return token | ||
|
|
||
| ttl = getattr(self, "_container_sas_ttl", DEFAULT_CONTAINER_SAS_TTL) |
There was a problem hiding this comment.
again this should be set in config.py in the ConfigurationManager class. I don't see it changed.
| def _get_container_sas(self, container_name): | ||
| cache_entry = self._container_sas_tokens.get(container_name) | ||
| now = datetime.now(timezone.utc) | ||
| refresh_margin = getattr( |
There was a problem hiding this comment.
again this should be set in config.py in the ConfigurationManager class. I don't see it changed.
| connection_string = storage_options.get("connection_string") | ||
| account_url = storage_options.get("account_url") | ||
| account_name = storage_options.get("account_name") |
There was a problem hiding this comment.
again this should be set in config.py in the ConfigurationManager class. Then used in storage class as well as here.
| self._container_sas_ttl = self._get_duration_option( | ||
| storage_options, | ||
| "container_sas_ttl", | ||
| DEFAULT_CONTAINER_SAS_TTL, | ||
| ) | ||
| self._container_sas_refresh_margin = self._get_duration_option( | ||
| storage_options, | ||
| "sas_refresh_margin", | ||
| DEFAULT_CONTAINER_SAS_REFRESH_MARGIN, | ||
| ) |
There was a problem hiding this comment.
again this should be set in config.py in the ConfigurationManager class. I don't see it changed.
There was a problem hiding this comment.
thanks, I have explicitly surfaced them in config.py.
Add ADLS Gen2 storage backend support Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Change ADLS Gen2 namespace type to HIERARCHICAL Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Implement ADLS Gen2 API calls using Azure SDK Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Fix code review issues: remove unused import and fix bare except clauses Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Add ADLS Gen2 PyTorch checkpointing and fixes Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Remove unused ctypes import from pytorch_adls_checkpointing Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Add ADLS Gen2 support to CI workflow Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Add workflow_dispatch trigger to ci.yml Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Fix Microsoft repository 403 errors in CI Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Implement ADLS Gen2 test suite Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Fix ADLS Gen2 generator support for NPY/NPZ formats Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Move DataLakeServiceClient import to module level for test patching Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Fix ADLS test configuration and add DefaultAzureCredential patching Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Remove redundant comment from test for consistency Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Fix ADLS Gen2 URI parsing in all storage methods Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Refactor URI parsing to use helper method and reduce duplication Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Add debug logging to investigate test failure Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Add debug logging to create_node as well Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Add comprehensive debug logging to ADLS Gen2 storage Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Fix get_uri to avoid double URI prefixing Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Fix mock get_paths to return full paths matching Azure SDK behavior Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Move AzStorageCheckpoint import to module level for test patching Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Fix test_adls_subset to match S3 pattern - remove incorrect checkpoint mode Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Implement proper AzStorageCheckpoint mock with writer/reader context managers Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Remove redundant AzStorageCheckpoint patches from checkpoint tests Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> Fix ADLS checkpoint test: always apply MockAzStorageCheckpoint patch Co-authored-by: zsy056 <1074382+zsy056@users.noreply.github.com> fix adlsgen2 tests
Summary
This PR introduces ADLS Gen2 backend support for DLIO, including storage access, PyTorch checkpointing, factory wiring, and workload config variants.
What feature is added
Major Code Changes
dlio_benchmark/storage/adls_gen2_storage.pydlio_benchmark/checkpointing/pytorch_adls_checkpointing.pydlio_benchmark/checkpointing/checkpointing_factory.pydlio_benchmark/common/enumerations.pydlio_benchmark/data_generator/generator_factory.pydlio_benchmark/reader/reader_factory.pydlio_benchmark/configs/workload/unet3d_a100_adlsgen2.yamldlio_benchmark/configs/workload/unet3d_h100_adlsgen2.yamldlio_benchmark/configs/workload/unet3d_v100_adlsgen2.yaml.github/workflows/ci.ymlTesting / Validation