You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-access-data-interactive.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,8 @@ ms.service: machine-learning
7
7
ms.subservice: core
8
8
ms.topic: how-to
9
9
author: samuel100
10
-
ms.author: samkemp
11
-
ms.date: 10/28/2022
10
+
ms.author: franksolomon
11
+
ms.date: 11/17/2022
12
12
ms.custom: sdkv2
13
13
#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
14
14
---
@@ -40,7 +40,7 @@ Typically the beginning of a machine learning project involves exploratory data
40
40
> pip install -U azureml-fsspec mltable
41
41
>```
42
42
43
-
## Access data from a Datastore URI like a filesystem (preview)
43
+
## Access data from a datastore URI, like a filesystem (preview)
> 1. Select **Data** from the left-hand menu followed by the **Datastores** tab.
86
86
> 1. Select your datastore name and then **Browse**.
87
87
> 1. Find the file/folder you want to read into pandas, select the elipsis (**...**) next to it. Select from the menu **Copy URI**. You can select the **Datastore URI** to copy into your notebook/script.
> :::image type="content" source="media/how-to-access-data-ci/datastore_uri_copy.png" alt-text="Screenshot highlighting the copy of the datastore URI.":::
89
89
90
90
You can also instantiate an Azure ML filesystem and do filesystem-like commands like `ls`, `glob`, `exists`, `open`, etc. The `open()` method will return a file-like object, which can be passed to any other library that expects to work with python files, or used by your own code as you would a normal python file object. These file-like objects respect the use of `with` contexts, for example:
91
91
@@ -109,7 +109,7 @@ with fs.open('/datastore_name/folder/file1.csv') as f:
109
109
110
110
### Examples
111
111
112
-
In this section we provide some examples of how to using Filesystem spec, for some common scenarios.
112
+
In this section we provide some examples of how to use Filesystem spec, for some common scenarios.
113
113
114
114
#### Read a single CSV file into pandas
115
115
@@ -160,7 +160,7 @@ df.head()
160
160
#### Read a folder of parquet files into pandas
161
161
Parquet files are typically written to a folder as part of an ETL process, which can emit files pertaining to the ETL such as progress, commits, etc. Below is an example of files created from an ETL process (files beginning with `_`) to produce a parquet file of data.
:::image type="content" source="media/how-to-access-data-ci/parquet-auxillary.png" alt-text="Screenshot showing the parquet etl process.":::
164
164
165
165
In these scenarios, you'll only want to read the parquet files in the folder and ignore the ETL process files. The code below shows how you can use glob patterns to read only parquet files in a folder:
166
166
@@ -185,7 +185,7 @@ df = pd.concat(dflist)
185
185
df.head()
186
186
```
187
187
188
-
#### Accessing data from your Azure Databricks Filesystem (`dbfs`)
188
+
#### Accessing data from your Azure Databricks filesystem (`dbfs`)
189
189
190
190
Filesystem spec (`fsspec`) has a range of [known implementations](https://filesystem-spec.readthedocs.io/en/stable/_modules/index.html), one of which is the Databricks Filesystem (`dbfs`).
191
191
@@ -456,7 +456,7 @@ df.head()
456
456
> 1. Select **Data** from the left-hand menu followed by the **Datastores** tab.
457
457
> 1. Select your datastore name and then **Browse**.
458
458
> 1. Find the file/folder you want to read into pandas, select the elipsis (**...**) next to it. Select from the menu **Copy URI**. You can select the **Datastore URI** to copy into your notebook/script.
> :::image type="content" source="media/how-to-access-data-ci/datastore_uri_copy.png" alt-text="Screenshot highlighting the copy of the datastore URI.":::
460
460
461
461
##### [HTTP Server](#tab/http)
462
462
```python
@@ -529,7 +529,7 @@ df.head()
529
529
> 1. Select **Data** from the left-hand menu followed by the **Datastores** tab.
530
530
> 1. Select your datastore name and then **Browse**.
531
531
> 1. Find the file/folder you want to read into pandas, select the elipsis (**...**) next to it. Select from the menu **Copy URI**. You can select the **Datastore URI** to copy into your notebook/script.
> :::image type="content" source="media/how-to-access-data-ci/datastore_uri_copy.png" alt-text="Screenshot highlighting the copy of the datastore URI.":::
533
533
534
534
##### [HTTP Server](#tab/http)
535
535
@@ -552,7 +552,7 @@ df.head()
552
552
553
553
---
554
554
555
-
### Reading Data assets
555
+
### Reading data assets
556
556
In this section, you'll learn how to access your Azure ML data assets into pandas.
0 commit comments