Merge pull request #100614 from nibaccam/guardrails

PRMerger18 · web-flow · commit fb2efbafebb2 · 2020-01-10T13:49:50.000-08:00
Add guardrails
diff --git a/articles/machine-learning/concept-automated-ml.md b/articles/machine-learning/concept-automated-ml.md
@@ -93,7 +93,7 @@ In every automated machine learning experiment, your data is automatically scale
 
 ### Advanced preprocessing: optional featurization
 
-Additional advanced preprocessing and featurization are also available, such as missing values imputation, encoding, and transforms. [Learn more about what featurization is included](how-to-create-portal-experiments.md#preprocess). Enable this setting with:
+Additional advanced preprocessing and featurization are also available, such as data guardrails, encoding, and transforms. [Learn more about what featurization is included](how-to-create-portal-experiments.md#preprocess). Enable this setting with:
 
 + Azure Machine Learning studio: Selecting the **View featurization settings** in the **Configuration Run** section [with these steps](how-to-create-portal-experiments.md).
 
@@ -160,7 +160,7 @@ Learn more and see an example of [automated machine learning for time series for
 
 * holiday detection and featurization
 * time-series and DNN learners (Auto-ARIMA, Prophet, ForecastTCN)
-* many model support through grouping
+* many models support through grouping
 * rolling-origin cross validation
 * configurable lags
 * rolling window aggregate features
diff --git a/articles/machine-learning/how-to-create-portal-experiments.md b/articles/machine-learning/how-to-create-portal-experiments.md
@@ -146,11 +146,12 @@ Variance| Measure of how far spread out this column's data is from its average v
 Skewness| Measure of how different this column's data is from a normal distribution.
 Kurtosis| Measure of how heavily tailed this column's data is compared to a normal distribution.
 
+
 <a name="preprocess"></a>
 
 ## Advanced preprocessing options
 
-When configuring your experiments, you can enable the advanced setting `Preprocess`. Doing so means that the following data preprocessing and featurization steps are performed automatically.
+When configuring your experiments, you can enable the advanced setting `Preprocess`. Doing so means that as part of preprocessing the following data guardrails and featurization steps are performed automatically.
 
 |Preprocessing&nbsp;steps| Description |
 | ------------- | ------------- |
@@ -164,6 +165,20 @@ When configuring your experiments, you can enable the advanced setting `Preproce
 |Weight of Evidence (WoE)|Calculates WoE as a measure of correlation of categorical columns to the target column. It is calculated as the log of the ratio of in-class vs out-of-class probabilities. This step outputs one numerical feature column per class and removes the need to explicitly impute missing values and outlier treatment.|
 |Cluster Distance|Trains a k-means clustering model on all numerical columns.  Outputs k new features, one new numerical feature per cluster, containing the distance of each sample to the centroid of each cluster.|
 
+### Data guardrails
+
+Automated machine learning offers data guardrails to help you identify potential issues with your data (e.g., missing values, class imbalance) and help take corrective actions for improved results. There are many best practices that are available and can be applied to achieve reliable results. 
+
+The following table describes the currently supported data guardrails, and the associated statuses that users may come across when submitting their experiment.
+
+Guardrail|Status|Condition&nbsp;for&nbsp;trigger
+---|---|---
+Missing&nbsp;values&nbsp;imputation |**Passed** <br> <br> **Fixed**|	No missing value in any of the input&nbsp;columns <br> <br> Some columns have missing values
+Cross validation|**Done**|If no explicit validation set is provided
+High&nbsp;cardinality&nbsp;feature&nbsp;detection|	**Passed** <br> <br>**Done**|	No high cardinality features were detected <br><br> High cardinality input columns were detected
+Class balance detection	|**Passed** <br><br><br>**Alerted** |Classes are balanced in the training data; A dataset is considered balanced if each class has good representation in the dataset, as measured by number and ratio of samples <br> <br> Classes in the training data are imbalanced
+Time-series data consistency|**Passed** <br><br><br><br> **Fixed** |<br> The selected {horizon, lag, rolling window} value(s) were analyzed, and no potential out-of-memory issues were detected. <br> <br>The selected {horizon, lag, rolling window} values were analyzed and will potentially cause your experiment to run out of memory. The lag or rolling window has been turned off.
+
 ## Run experiment and view results
 
 Select **Start** to run your experiment. The experiment preparing process can take up to 10 minutes. Training jobs can take an additional 2-3 minutes more for each pipeline to finish running.