You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An Azure ML datastore is a *reference* to an existing storage account on Azure. The key benefits of creating and using an Azure ML datastore are:
443
+
444
+
- A common and easy-to-use API to interact with different storage types in Azure (Blob/Files/<datastore>).
445
+
- Easier to discover useful datastores when working as a team.
446
+
- Authentication is automatically handled - both *credential-based* access (service principal/SAS/key) and *identity-based* access (Azure Active Directory/managed identity) are supported. When using credential-based authentication, you do not need to expose secrets in your code.
447
+
448
+
This requires the installation of the library ``azureml-fsspec``
can be used to list files in a directory in a container:
455
+
456
+
.. code:: python
457
+
458
+
from torchdata.datapipes.iter import IterableWrapper
459
+
460
+
# set the subscription_id, resource_group, and AzureML workspace_name
461
+
subscription_id ="<subscription_id>"
462
+
resource_group ="<resource_group>"
463
+
workspace_name ="<workspace_name>"
464
+
465
+
# set the datastore name and path on the datastore
466
+
datastore_name ="<datastore_name>"
467
+
path_on_datastore ="<path_on_datastore>"
468
+
469
+
uri =f"azureml://subscriptions/{subscription_id}/resourcegroups/{resource_group}/workspaces/{workspace_name}/datastores/{datastore_name}/paths/{path_on_datastore}"
You can also open files using `FSSpecFileOpener <generated/torchdata.datapipes.iter.FSSpecFileOpener.html>`_
477
+
(``.open_files_by_fsspec(...)``) and stream them
478
+
(if supported by the file format).
479
+
480
+
Here is an example of loading a tar file from the default Azure ML datastore ``workspaceblobstore`` where the path is ``/cifar-10-python.tar.gz`` (top-level folder).
481
+
482
+
.. code:: python
483
+
484
+
from torchdata.datapipes.iter import IterableWrapper
485
+
486
+
# set the subscription_id, resource_group, and AzureML workspace_name
487
+
subscription_id ="<subscription_id>"
488
+
resource_group ="<resource_group>"
489
+
workspace_name ="<workspace_name>"
490
+
491
+
# set the datastore name and path on the datastore
492
+
datastore_name ="workspaceblobstore"
493
+
path_on_datastore ="cifar-10-python.tar.gz"
494
+
495
+
uri =f"azureml://subscriptions/{subscription_id}/resourcegroups/{resource_group}/workspaces/{workspace_name}/datastores/{datastore_name}/paths/{path_on_datastore}"
Here is an example of loading a CSV file - the famous Titanic dataset (`download <https://raw.githubusercontent.com/Azure/azureml-examples/main/cli/assets/data/sample-data/titanic.csv>`_) - from the Azure ML datastore ``workspaceblobstore`` where the path is ``/titanic.csv`` (top-level folder).
513
+
514
+
.. code:: python
515
+
516
+
from torchdata.datapipes.iter import IterableWrapper
517
+
518
+
# set the subscription_id, resource_group, and AzureML workspace_name
519
+
subscription_id ="<subscription_id>"
520
+
resource_group ="<resource_group>"
521
+
workspace_name ="<workspace_name>"
522
+
523
+
# set the datastore name and path on the datastore
524
+
datastore_name ="workspaceblobstore"
525
+
path_on_datastore ="titanic.csv"
526
+
527
+
uri =f"azureml://subscriptions/{subscription_id}/resourcegroups/{resource_group}/workspaces/{workspace_name}/datastores/{datastore_name}/paths/{path_on_datastore}"
528
+
529
+
defrow_processer(row):
530
+
# return the label and data (the class and age of the passenger)
0 commit comments