You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -43,7 +43,7 @@ A machine learning project typically starts with exploratory data analysis (EDA)
43
43
44
44
## Download the data used in this tutorial
45
45
46
-
For data ingestion, Azure Data Explorer handles raw data in [these formats](/azure/data-explorer/ingestion-supported-formats). This tutorial uses a [CSV-format credit card client data sample](https://azuremlexamples.blob.core.windows.net/datasets/credit_card/default_of_credit_card_clients.csv). The steps take place in an Azure Machine Learning resource. In that resource, you'll create a local folder with the suggested name of **data**, directly under the folder where this notebook is located.
46
+
For data ingestion, Azure Data Explorer handles raw data in [these formats](/azure/data-explorer/ingestion-supported-formats). This tutorial uses a [CSV-format credit card client data sample](https://azuremlexamples.blob.core.windows.net/datasets/credit_card/default_of_credit_card_clients.csv). The steps take place in an Azure Machine Learning resource. In that resource, you create a local folder with the suggested name of **data**, directly under the folder where this notebook is located.
47
47
48
48
> [!NOTE]
49
49
> This tutorial depends on data placed in an Azure Machine Learning resource folder location. For this tutorial, 'local' means a folder location in that Azure Machine Learning resource.
@@ -72,7 +72,7 @@ For more information about the data in the UC Irvine Machine Learning Repository
72
72
73
73
## Create a handle to the workspace
74
74
75
-
Before you explore the code, you need a way to reference your workspace. You'll create `ml_client` as a handle to the workspace. You then use `ml_client` to manage resources and jobs.
75
+
Before you explore the code, you need a way to reference your workspace. You create `ml_client` as a handle to the workspace. You then use `ml_client` to manage resources and jobs.
76
76
77
77
In the next cell, enter your Subscription ID, Resource Group name, and Workspace name. To find these values:
78
78
@@ -99,7 +99,7 @@ ml_client = MLClient(
99
99
```
100
100
101
101
> [!NOTE]
102
-
> Creating MLClient will not connect to the workspace. The client initialization is lazy and waits for the first time it needs to make a call. This happens in the next code cell.
102
+
> Creating MLClient won't connect to the workspace. The client initialization is lazy and waits for the first time it needs to make a call. This happens in the next code cell.
103
103
104
104
## Upload data to cloud storage
105
105
@@ -117,11 +117,11 @@ Data asset creation also creates a *reference* to the data source location, alon
117
117
118
118
The next notebook cell creates the data asset. The code sample uploads the raw data file to the designated cloud storage resource.
119
119
120
-
Each time you create a data asset, you need a unique version for it. If the version already exists, you'll get an error. In this code, you use "initial" for the first read of the data. If that version already exists, the code doesn't recreate it.
120
+
Each time you create a data asset, you need a unique version for it. If the version already exists, you get an error. In this code, you use "initial"for the first read of the data. If that version already exists, the code doesn't recreate it.
121
121
122
122
You can also omit the **version** parameter. In this case, a version number is generated for you, starting with 1 and incrementing from there.
123
123
124
-
This tutorial uses the name "initial" as the first version. The [Create production machine learning pipelines](tutorial-pipeline-python-sdk.md) tutorial also uses this version of the data, so you use a value that you'll see again in that tutorial.
124
+
This tutorial uses the name "initial" as the first version. The [Create production machine learning pipelines](tutorial-pipeline-python-sdk.md) tutorial also uses this version of the data, so you use a value that you see again in that tutorial.
However, as mentioned previously, it can become difficult to remember these URIs. Additionally, you must manually substitute all **<_substring_>** values in the **pd.read_csv**command with the real values for your resources.
185
185
186
-
You'll want to create data assets for frequently accessed data. Here's an easier way to access the CSV file in Pandas:
186
+
You want to create data assets forfrequently accessed data. Here's an easier way to access the CSV filein Pandas:
187
187
188
188
> [!IMPORTANT]
189
189
> In a notebook cell, execute this code to install the `azureml-fsspec` Python library in your Jupyter kernel:
0 commit comments