Skip to content

Commit 4e3c637

Browse files
committed
add some examples for split data, update retrain article
1 parent bb89524 commit 4e3c637

File tree

2 files changed

+51
-2
lines changed

2 files changed

+51
-2
lines changed

articles/machine-learning/algorithm-module-reference/split-data.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,26 @@ This module is particularly useful when you need to separate data into training
7979

8080
Based on the regular expression you provide, the dataset is divided into two sets of rows: rows with values that match the expression and all remaining rows.
8181

82+
The following examples demonstrate how to divide a dataset using the **Regular Expression** option.
83+
84+
### Single whole word
85+
86+
This example puts into the first dataset all rows that contain the text `Gryphon` in the column `Text`, and puts other rows into the second output of **Split Data**:
87+
88+
```text
89+
\"Text" Gryphon
90+
```
91+
92+
### Substring
93+
94+
This example looks for the specified string in any position within the second column of the dataset, denoted here by the index value of 1. The match is case-sensitive.
95+
96+
```text
97+
(\1) ^[a-f]
98+
```
99+
100+
The first result dataset contains all rows where the index column begins with one of these characters: `a`, `b`, `c`, `d`, `e`, `f`. All other rows are directed to the second output.
101+
82102
## Relative expression split.
83103

84104
1. Add the [Split Data](./split-data.md) module to your pipeline, and connect it as input to the dataset you want to split.
@@ -108,6 +128,33 @@ This module is particularly useful when you need to separate data into training
108128

109129
The expression divides the dataset into two sets of rows: rows with values that meet the condition, and all remaining rows.
110130

131+
The following examples demonstrate how to divide a dataset using the **Relative Expression** option in the **Split Data** module:
132+
133+
### Using calendar year
134+
135+
A common scenario is to divide a dataset by years. The following expression selects all rows where the values in the column `Year` are greater than `2010`.
136+
137+
```text
138+
\"Year" > 2010
139+
```
140+
141+
The date expression must account for all date parts that are included in the data column, and the format of dates in the data column must be consistent.
142+
143+
For example, in a date column using the format `mmddyyyy`, the expression should be something like this:
144+
145+
```text
146+
\"Date" > 1/1/2010
147+
```
148+
149+
### Using column indices
150+
151+
The following expression demonstrates how you can use the column index to select all rows in the first column of the dataset that contain values less than or equal to 30, but not equal to 20.
152+
153+
```text
154+
(\0)<=30 & !=20
155+
```
156+
157+
111158
## Next steps
112159

113160
See the [set of modules available](module-reference.md) to Azure Machine Learning.

articles/machine-learning/how-to-retrain-designer.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,15 +114,17 @@ Use the following steps to submit a pipeline endpoint run from the designer:
114114

115115
1. Select the pipeline you want to run.
116116

117-
1. Select **Run**.
117+
1. Select **Submit**.
118118

119119
1. In the setup dialog, you can specify a new input data path value, which points to your new dataset.
120120

121121
![Screenshot showing how to set up a parameterized pipeline run in the designer](./media/how-to-retrain-designer/published-pipeline-run.png)
122122

123123
### Submit runs with code
124124

125-
There are multiple ways to access your REST endpoint programatically depending on your development environment. You can find code samples that show you how to submit pipeline runs with parameters in the **Consume** tab of your pipeline.
125+
You can find the REST endpoint of a published pipeline in the overview panel. By calling the endpoint, you can retrain the published pipeline.
126+
127+
To make a REST call, you will need an OAuth 2.0 bearer-type authentication header. See the following [tutorial section](tutorial-pipeline-batch-scoring-classification.md#publish-and-run-from-a-rest-endpoint) for more detail on setting up authentication to your workspace and making a parameterized REST call.
126128

127129
## Next steps
128130

0 commit comments

Comments
 (0)