Skip to content

Commit fb2efba

Browse files
authored
Merge pull request #100614 from nibaccam/guardrails
Add guardrails
2 parents 004064e + cda669b commit fb2efba

File tree

2 files changed

+18
-3
lines changed

2 files changed

+18
-3
lines changed

articles/machine-learning/concept-automated-ml.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ In every automated machine learning experiment, your data is automatically scale
9393

9494
### Advanced preprocessing: optional featurization
9595

96-
Additional advanced preprocessing and featurization are also available, such as missing values imputation, encoding, and transforms. [Learn more about what featurization is included](how-to-create-portal-experiments.md#preprocess). Enable this setting with:
96+
Additional advanced preprocessing and featurization are also available, such as data guardrails, encoding, and transforms. [Learn more about what featurization is included](how-to-create-portal-experiments.md#preprocess). Enable this setting with:
9797

9898
+ Azure Machine Learning studio: Selecting the **View featurization settings** in the **Configuration Run** section [with these steps](how-to-create-portal-experiments.md).
9999

@@ -160,7 +160,7 @@ Learn more and see an example of [automated machine learning for time series for
160160

161161
* holiday detection and featurization
162162
* time-series and DNN learners (Auto-ARIMA, Prophet, ForecastTCN)
163-
* many model support through grouping
163+
* many models support through grouping
164164
* rolling-origin cross validation
165165
* configurable lags
166166
* rolling window aggregate features

articles/machine-learning/how-to-create-portal-experiments.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,11 +146,12 @@ Variance| Measure of how far spread out this column's data is from its average v
146146
Skewness| Measure of how different this column's data is from a normal distribution.
147147
Kurtosis| Measure of how heavily tailed this column's data is compared to a normal distribution.
148148

149+
149150
<a name="preprocess"></a>
150151

151152
## Advanced preprocessing options
152153

153-
When configuring your experiments, you can enable the advanced setting `Preprocess`. Doing so means that the following data preprocessing and featurization steps are performed automatically.
154+
When configuring your experiments, you can enable the advanced setting `Preprocess`. Doing so means that as part of preprocessing the following data guardrails and featurization steps are performed automatically.
154155

155156
|Preprocessing&nbsp;steps| Description |
156157
| ------------- | ------------- |
@@ -164,6 +165,20 @@ When configuring your experiments, you can enable the advanced setting `Preproce
164165
|Weight of Evidence (WoE)|Calculates WoE as a measure of correlation of categorical columns to the target column. It is calculated as the log of the ratio of in-class vs out-of-class probabilities. This step outputs one numerical feature column per class and removes the need to explicitly impute missing values and outlier treatment.|
165166
|Cluster Distance|Trains a k-means clustering model on all numerical columns. Outputs k new features, one new numerical feature per cluster, containing the distance of each sample to the centroid of each cluster.|
166167

168+
### Data guardrails
169+
170+
Automated machine learning offers data guardrails to help you identify potential issues with your data (e.g., missing values, class imbalance) and help take corrective actions for improved results. There are many best practices that are available and can be applied to achieve reliable results.
171+
172+
The following table describes the currently supported data guardrails, and the associated statuses that users may come across when submitting their experiment.
173+
174+
Guardrail|Status|Condition&nbsp;for&nbsp;trigger
175+
---|---|---
176+
Missing&nbsp;values&nbsp;imputation |**Passed** <br> <br> **Fixed**| No missing value in any of the input&nbsp;columns <br> <br> Some columns have missing values
177+
Cross validation|**Done**|If no explicit validation set is provided
178+
High&nbsp;cardinality&nbsp;feature&nbsp;detection| **Passed** <br> <br>**Done**| No high cardinality features were detected <br><br> High cardinality input columns were detected
179+
Class balance detection |**Passed** <br><br><br>**Alerted** |Classes are balanced in the training data; A dataset is considered balanced if each class has good representation in the dataset, as measured by number and ratio of samples <br> <br> Classes in the training data are imbalanced
180+
Time-series data consistency|**Passed** <br><br><br><br> **Fixed** |<br> The selected {horizon, lag, rolling window} value(s) were analyzed, and no potential out-of-memory issues were detected. <br> <br>The selected {horizon, lag, rolling window} values were analyzed and will potentially cause your experiment to run out of memory. The lag or rolling window has been turned off.
181+
167182
## Run experiment and view results
168183

169184
Select **Start** to run your experiment. The experiment preparing process can take up to 10 minutes. Training jobs can take an additional 2-3 minutes more for each pipeline to finish running.

0 commit comments

Comments
 (0)