Skip to content

Commit d6af49c

Browse files
authored
Merge pull request #269239 from fbsolo-ms1/freshness-update-branch
Freshness update for how-to-designer-transform-data.md . . .
2 parents a22034b + fa67ebc commit d6af49c

File tree

4 files changed

+48
-49
lines changed

4 files changed

+48
-49
lines changed

articles/machine-learning/v1/how-to-designer-transform-data.md

Lines changed: 48 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.custom: UpdateFrequency5, designer
1515

1616
# Transform data in Azure Machine Learning designer
1717

18-
In this article, you'll learn how to transform and save datasets in the Azure Machine Learning designer, to prepare your own data for machine learning.
18+
In this article, you learn how to transform and save datasets in the Azure Machine Learning designer, to prepare your own data for machine learning.
1919

2020
You'll use the sample [Adult Census Income Binary Classification](samples-designer.md) dataset to prepare two datasets: one dataset that includes adult census information from only the United States, and another dataset that includes census information from non-US adults.
2121

@@ -28,135 +28,134 @@ In this article, you'll learn how to:
2828
This how-to is a prerequisite for the [how to retrain designer models](how-to-retrain-designer.md) article. In that article, you'll learn how to use the transformed datasets to train multiple models, with pipeline parameters.
2929

3030
> [!IMPORTANT]
31-
> If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not have the right level of permissions to the workspace. Please contact your Azure subscription administrator to verify that you have been granted the correct level of access. For more information, see [Manage users and roles](../how-to-assign-roles.md).
31+
> If you do not observe graphical elements mentioned in this document, such as buttons in studio or designer, you may not have the correct level of permissions to the workspace. Please contact your Azure subscription administrator to verify that you have been granted the correct level of access. For more information, visit [Manage users and roles](../how-to-assign-roles.md).
3232
3333
## Transform a dataset
3434

35-
In this section, you'll learn how to import the sample dataset, and split the data into US and non-US datasets. See [how to import data](how-to-designer-import-data.md) for more information about how to import your own data into the designer.
35+
In this section, you'll learn how to import the sample dataset, and split the data into US and non-US datasets. Visit [how to import data](how-to-designer-import-data.md) for more information about how to import your own data into the designer.
3636

3737
### Import data
3838

3939
Use these steps to import the sample dataset:
4040

41-
1. Sign in to <a href="https://ml.azure.com?tabs=jre" target="_blank">ml.azure.com</a>, and select the workspace you want to use.
41+
1. Sign in to [Azure Machine Learning studio](https://ml.azure.com), and select the workspace you want to use
4242

43-
1. Go to the designer. Select **Easy-to-use-prebuild components** to create a new pipeline.
43+
1. Go to the designer. Select **Create a new pipeline using classic prebuilt components** to create a new pipeline
4444

45-
1. Select a default compute target to run the pipeline.
45+
1. To the left of the pipeline canvas, in the **Component** tab, expand the **Sample data** node
4646

47-
1. To the left of the pipeline canvas, you'll see a palette of datasets and components. Select **Datasets**. Then view the **Samples** section.
47+
1. Drag and drop the **Adult Census Income Binary classification** dataset onto the canvas
4848

49-
1. Drag and drop the **Adult Census Income Binary classification** dataset onto the canvas.
49+
1. Right-select the **Adult Census Income** dataset component, and select **Preview data**
5050

51-
1. Right-click the **Adult Census Income** dataset component, and select **Visualize** > **Dataset output**
52-
53-
1. Use the data preview window to explore the dataset. Take special note of the "native-country" column values.
51+
1. Use the data preview window to explore the dataset. Take special note of the "native-country" column values
5452

5553
### Split the data
5654

57-
In this section, you'll use the [Split Data component](../algorithm-module-reference/split-data.md) to identify and split rows that contain "United-States" in the "native-country" column.
55+
In this section, you'll use the [Split Data component](../algorithm-module-reference/split-data.md) to identify and split rows that contain "United-States" in the "native-country" column
5856

59-
1. To the left of the canvas, in the component palette, expand the **Data Transformation** section, and find the **Split Data** component.
57+
1. To the left of the canvas, in the component tab, expand the **Data Transformation** section, and find the **Split Data** component
6058

61-
1. Drag the **Split Data** component onto the canvas, and drop that component below the dataset component.
59+
1. Drag the **Split Data** component onto the canvas, and drop that component below the dataset component
6260

63-
1. Connect the dataset component to the **Split Data** component.
61+
1. Connect the dataset component to the **Split Data** component
6462

65-
1. Select the **Split Data** component.
63+
1. Select the **Split Data** component, to open the **Split Data** pane
6664

67-
1. To the right of the canvas in the component details pane, set **Splitting mode** to **Regular Expression**.
65+
1. To the right of the canvas in the **Parameters** icon, set **Splitting mode** to **Regular Expression**
6866

69-
1. Enter the **Regular Expression**: `\"native-country" United-States`.
67+
1. Enter the **Regular Expression**: `\"native-country" United-States`
7068

71-
The **Regular expression** mode tests a single column for a value. See the related [algorithm component reference page](../algorithm-module-reference/split-data.md) for more information on the Split Data component.
69+
The **Regular expression** mode tests a single column for a value. Visit the related [algorithm component reference page](../algorithm-module-reference/split-data.md) for more information on the Split Data component
7270

73-
Your pipeline should look like this:
71+
Your pipeline should resemble this screenshot:
7472

7573
:::image type="content" source="./media/how-to-designer-transform-data/split-data.png" alt-text="Screenshot that shows how to configure the pipeline and the Split Data component":::
7674

77-
7875
## Save the datasets
7976

80-
Now that you set up your pipeline to split the data, you must specify where to persist the datasets. For this example, use the **Export Data** component to save your dataset to a datastore. See [Connect to Azure storage services](how-to-access-data.md) for more information about datastores.
77+
Now that you set up your pipeline to split the data, you must specify where to persist the datasets. For this example, use the **Export Data** component to save your dataset to a datastore. Visit [Connect to Azure storage services](how-to-access-data.md) for more information about datastores.
8178

82-
1. To the left of the canvas in the component palette, expand the **Data Input and Output** section, and find the **Export Data** component.
79+
1. To the left of the canvas in the component palette, expand the **Data Input and Output** section, and find the **Export Data** component
8380

84-
1. Drag and drop two **Export Data** components below the **Split Data** component.
81+
1. Drag and drop two **Export Data** components below the **Split Data** component
8582

86-
1. Connect each output port of the **Split Data** component to a different **Export Data** component.
83+
1. Connect each output port of the **Split Data** component to a different **Export Data** component
8784

88-
Your pipeline should look something like this:
85+
Your pipeline should resemble this:
8986

90-
![Screenshot showing how to connect the Export Data components](media/how-to-designer-transform-data/export-data-pipeline.png).
87+
![Screenshot showing how to connect the Export Data components](media/how-to-designer-transform-data/export-data-pipeline.png)
9188

92-
1. Select the **Export Data** component connected to the *left*-most port of the **Split Data** component.
89+
1. Select the **Export Data** component connected to the *left*-most port of the **Split Data** component, to open the Export Data configuration pane
9390

94-
For the **Split Data** component, the output port order matters. The first output port contains the rows where the regular expression is true. In this case, the first port contains rows for US-based income, and the second port contains rows for non-US based income.
91+
For the **Split Data** component, the output port order is important. The first output port contains the rows where the regular expression is true. In this case, the first port contains rows for US-based income, and the second port contains rows for non-US based income
9592

9693
1. In the component details pane to the right of the canvas, set the following options:
9794

9895
**Datastore type**: Azure Blob Storage
9996

100-
**Datastore**: Select an existing datastore, or select "New datastore" to create one now.
97+
**Datastore**: Select an existing datastore, or select "New datastore" to create a new one
10198

10299
**Path**: `/data/us-income`
103100

104101
**File format**: csv
105102

106103
> [!NOTE]
107-
> This article assumes that you have access to a datastore registered to the current Azure Machine Learning workspace. See [Connect to Azure storage services](how-to-connect-data-ui.md#create-datastores) for datastore setup instructions.
104+
> This article assumes that you have access to a datastore registered to the current Azure Machine Learning workspace. Visit [Connect to Azure storage services](how-to-connect-data-ui.md#create-datastores) for datastore setup instructions
108105
109-
You can create a datastore if you don't have one now. For example purposes, this article will save the datasets to the default blob storage account associated with the workspace. It will save the datasets into the `azureml` container, in a new folder named `data`.
106+
You can create a datastore if you don't have one now. For example purposes, this article saves the datasets to the default blob storage account associated with the workspace. It saves the datasets into the `azureml` container, in a new folder named `data`
110107

111-
1. Select the **Export Data** component connected to the *right*-most port of the **Split Data** component.
108+
1. Select the **Export Data** component connected to the *right*-most port of the **Split Data** component, to open the Export Data configuration pane
112109

113110
1. To the right of the canvas in the component details pane, set the following options:
114111

115112
**Datastore type**: Azure Blob Storage
116113

117-
**Datastore**: Select the same datastore as above
114+
**Datastore**: Select the earlier datastore
118115

119116
**Path**: `/data/non-us-income`
120117

121118
**File format**: csv
122119

123-
1. Verify that the **Export Data** component connected to the left port of the **Split Data** has the **Path** `/data/us-income`.
120+
1. Verify that the **Export Data** component connected to the left port of the **Split Data** has the **Path** `/data/us-income`
124121

125-
1. Verify that the **Export Data** component connected to the right port has the **Path** `/data/non-us-income`.
122+
1. Verify that the **Export Data** component connected to the right port has the **Path** `/data/non-us-income`
126123

127124
Your pipeline and settings should look like this:
128125

129-
![Screenshot showing how to configure the Export Data components](media/how-to-designer-transform-data/us-income-export-data.png).
126+
![Screenshot showing how to configure the Export Data components](media/how-to-designer-transform-data/us-income-export-data.png)
130127

131128
### Submit the job
132129

133130
Now that you set up your pipeline to split and export the data, submit a pipeline job.
134131

135-
1. Select **Submit** at the top of the canvas.
132+
1. Select **Configure & Submit** at the top of the canvas
136133

137-
1. Select **Create new** in the **Set up pipeline job**, to create an experiment.
134+
1. Select the **Create new** option in the Basics pane of **Set up pipeline job**, to create an experiment
138135

139-
Experiments logically group related pipeline jobs together. If you run this pipeline in the future, you should use the same experiment for logging and tracking purposes.
136+
Experiments logically group related pipeline jobs together. If you run this pipeline in the future, you should use the same experiment for logging and tracking purposes
140137

141-
1. Provide a descriptive experiment name - for example "split-census-data".
138+
1. Provide a descriptive experiment name - for example "split-census-data"
142139

143-
1. Select **Submit**.
140+
1. Select **Review + Submit**, and then select **Submit**
144141

145142
## View results
146143

147-
After the pipeline finishes running, you can navigate to your Azure portal blob storage to view your results. You can also view the intermediary results of the **Split Data** component to confirm that your data has been split correctly.
144+
After the pipeline finishes running, you can navigate to your Azure portal blob storage to view your results. You can also view the intermediary results of the **Split Data** component to confirm that your data split correctly.
145+
146+
1. Select the **Split Data** component
148147

149-
1. Select the **Split Data** component.
148+
1. In the component details pane to the right of the canvas, select the **Outputs + logs** tab
150149

151-
1. In the component details pane to the right of the canvas, select **Outputs + logs**.
150+
1. Select the **Show data outputs** dropdown
152151

153-
1. Select the visualize icon ![visualize icon](media/how-to-designer-transform-data/visualize-icon.png) next to **Results dataset1**.
152+
1. Select the visualize icon ![visualize icon](media/how-to-designer-transform-data/visualize-icon.png) next to **Results dataset1**
154153

155-
1. Verify that the "native-country" column contains only the value "United-States".
154+
1. Verify that the "native-country" column contains only the value "United-States"
156155

157-
1. Select the visualize icon ![visualize icon](media/how-to-designer-transform-data/visualize-icon.png) next to **Results dataset2**.
156+
1. Select the visualize icon ![visualize icon](media/how-to-designer-transform-data/visualize-icon.png) next to **Results dataset2**
158157

159-
1. Verify that the "native-country" column does not contain the value "United-States".
158+
1. Verify that the "native-country" column doesn't contain the value "United-States"
160159

161160
## Clean up resources
162161

18.1 KB
Loading
-476 KB
Loading
37.6 KB
Loading

0 commit comments

Comments
 (0)