Skip to content

Commit f56e565

Browse files
Merge pull request #230330 from SturgeonMi/patch-22
Update how-to-access-data-interactive.md
2 parents 68a918a + 7a14a5b commit f56e565

File tree

1 file changed

+54
-18
lines changed

1 file changed

+54
-18
lines changed

articles/machine-learning/how-to-access-data-interactive.md

Lines changed: 54 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -88,26 +88,62 @@ df.head()
8888
> 1. Find the file/folder you want to read into pandas, select the elipsis (**...**) next to it. Select from the menu **Copy URI**. You can select the **Datastore URI** to copy into your notebook/script.
8989
> :::image type="content" source="media/how-to-access-data-ci/datastore_uri_copy.png" alt-text="Screenshot highlighting the copy of the datastore URI.":::
9090
91-
You can also instantiate an Azure Machine Learning filesystem and do filesystem-like commands like `ls`, `glob`, `exists`, `open`, etc. The `open()` method will return a file-like object, which can be passed to any other library that expects to work with python files, or used by your own code as you would a normal python file object. These file-like objects respect the use of `with` contexts, for example:
91+
You can also instantiate an Azure Machine Learning filesystem and do filesystem-like commands like `ls`, `glob`, `exists`, `open`.
92+
- The `ls()` method can be used to list files in the corresponding directory. You can use ls(), ls(.), ls (<<folder_level_1>/<folder_level_2>) to list files. We support both '.' and '..' in relative paths.
93+
- The `glob()` method supports '*' and '**' globbing.
94+
- The `exists()` method returns a Boolean value that indicates whether a specified file exists in current root directory.
95+
- The `open()` method will return a file-like object, which can be passed to any other library that expects to work with python files, or used by your own code as you would a normal python file object. These file-like objects respect the use of `with` contexts, for example:
9296

9397
```python
9498
from azureml.fsspec import AzureMachineLearningFileSystem
9599

96-
# instantiate file system using datastore URI
97-
fs = AzureMachineLearningFileSystem('azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastores/<datastore_name>/paths/<folder>')
100+
# instantiate file system using following URI
101+
fs = AzureMachineLearningFileSystem('azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastore/datastorename')
102+
103+
fs.ls() # list folders/files in datastore 'datastorename'
98104

99-
# list files in the path
100-
fs.ls()
101105
# output example:
102-
# /datastore_name/folder/file1.csv
103-
# /datastore_name/folder/file2.csv
106+
# folder1
107+
# folder2
108+
# file3.csv
104109

105110
# use an open context
106-
with fs.open('/datastore_name/folder/file1.csv') as f:
111+
with fs.open('./folder1/file1.csv') as f:
107112
# do some process
108113
process_file(f)
109114
```
110115

116+
### Upload files via AzureMachineLearningFileSystem
117+
118+
```python
119+
from azureml.fsspec import AzureMachineLearningFileSystem
120+
# instantiate file system using following URI
121+
fs = AzureMachineLearningFileSystem('azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastore/datastorename')
122+
123+
# you can specify recursive as False to upload a file
124+
fs.upload(lpath='data/upload_files/crime-spring.csv', rpath='data/fsspec', recursive=False, **{'overwrite': MERGE_WITH_OVERWRITE})
125+
126+
# you need to specify recursive as True to upload a folder
127+
fs.upload(lpath='data/upload_folder/', rpath='data/fsspec_folder', recursive=True, **{'overwrite': MERGE_WITH_OVERWRITE})
128+
```
129+
`lpath` is the local path, and `rpath` is the remote path.
130+
If the folders you specify in `rpath` do not exist yet, we will create the folders for you.
131+
132+
We support 3 modes for 'overwrite':
133+
- APPEND: if there is already a file with the same name in the destination path, will keep the original file
134+
- FAIL_ON_FILE_CONFLICT: if there is already a file with the same name in the destination path, will throw an error
135+
- MERGE_WITH_OVERWRITE: if there is already a file with the same name in the destination path, will overwrite with the new file
136+
137+
### Download files via AzureMachineLearningFileSystem
138+
```python
139+
# you can specify recursive as False to download a file
140+
# downloading overwrite option is determined by local system, and it is MERGE_WITH_OVERWRITE
141+
fs.download(rpath='data/fsspec/crime-spring.csv', lpath='data/download_files/, recursive=False)
142+
143+
# you need to specify recursive as True to download a folder
144+
fs.download(rpath='data/fsspec_folder', lpath='data/download_folder/', recursive=True)
145+
```
146+
111147
### Examples
112148

113149
In this section we provide some examples of how to use Filesystem spec, for some common scenarios.
@@ -131,14 +167,14 @@ import pandas as pd
131167
from azureml.fsspec import AzureMachineLearningFileSystem
132168

133169
# define the URI - update <> placeholders
134-
uri = 'azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastores/<datastore_name>/paths/<folder>/*.csv'
170+
uri = 'azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastores/<datastore_name>'
135171

136172
# create the filesystem
137173
fs = AzureMachineLearningFileSystem(uri)
138174

139175
# append csv files in folder to a list
140176
dflist = []
141-
for path in fs.ls():
177+
for path in fs.glob('/<folder>/*.csv'):
142178
with fs.open(path) as f:
143179
dflist.append(pd.read_csv(f))
144180

@@ -170,14 +206,14 @@ import pandas as pd
170206
from azureml.fsspec import AzureMachineLearningFileSystem
171207

172208
# define the URI - update <> placeholders
173-
uri = 'azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastores/<datastore_name>/paths/<folder>/*.parquet'
209+
uri = 'azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastores/<datastore_name>'
174210

175211
# create the filesystem
176212
fs = AzureMachineLearningFileSystem(uri)
177213

178214
# append parquet files in folder to a list
179215
dflist = []
180-
for path in fs.ls():
216+
for path in fs.glob('/<folder>/*.parquet'):
181217
with fs.open(path) as f:
182218
dflist.append(pd.read_parquet(f))
183219

@@ -225,14 +261,14 @@ from PIL import Image
225261
from azureml.fsspec import AzureMachineLearningFileSystem
226262

227263
# define the URI - update <> placeholders
228-
uri = 'azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastores/<datastore_name>/paths/<folder>/<image.jpeg>'
264+
uri = 'azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastores/<datastore_name>'
229265

230266
# create the filesystem
231267
fs = AzureMachineLearningFileSystem(uri)
232268

233-
with fs.open() as f:
269+
with fs.open('/<folder>/<image.jpeg>') as f:
234270
img = Image.open(f)
235-
img.show()
271+
img.show(
236272
```
237273

238274
#### PyTorch custom dataset example
@@ -306,16 +342,16 @@ from azureml.fsspec import AzureMachineLearningFileSystem
306342
from torch.utils.data import DataLoader
307343

308344
# define the URI - update <> placeholders
309-
uri = 'azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastores/<datastore_name>/paths/<folder>/'
345+
uri = 'azureml://subscriptions/<subid>/resourcegroups/<rgname>/workspaces/<workspace_name>/datastores/<datastore_name>'
310346

311347
# create the filesystem
312348
fs = AzureMachineLearningFileSystem(uri)
313349

314350
# create the dataset
315351
training_data = CustomImageDataset(
316352
filesystem=fs,
317-
annotations_file='<datastore_name>/<path>/annotations.csv',
318-
img_dir='<datastore_name>/<path_to_images>/'
353+
annotations_file='/annotations.csv',
354+
img_dir='/<path_to_images>/'
319355
)
320356

321357
# Preparing your data for training with DataLoaders

0 commit comments

Comments
 (0)