Merge pull request #196120 from gauravrajguru/validation_size

PRMerger9 · web-flow · commit d9db93fd5ebb · 2022-04-28T12:55:27.000-07:00
Replace split_ratio with validation_size parameter to split training data
diff --git a/articles/machine-learning/how-to-auto-train-image-models.md b/articles/machine-learning/how-to-auto-train-image-models.md
@@ -166,7 +166,7 @@ training_dataset = training_dataset.register(workspace=ws, name=training_dataset
 
 Automated ML does not impose any constraints on training or validation data size for computer vision tasks. Maximum dataset size is only limited by the storage layer behind the dataset (i.e. blob store). There is no minimum number of images or labels. However, we recommend to start with a minimum of 10-15 samples per label to ensure the output model is sufficiently trained. The higher the total number of labels/classes, the more samples you need per label.
 
-Training data is a required and is passed in using the `training_data` parameter. You can optionally specify another TabularDataset as a validation dataset to be used for your model with the `validation_data` parameter of the AutoMLImageConfig. If no validation dataset is specified, 20% of your training data will be used for validation by default, unless you pass `split_ratio` argument with a different value.
+Training data is a required and is passed in using the `training_data` parameter. You can optionally specify another TabularDataset as a validation dataset to be used for your model with the `validation_data` parameter of the AutoMLImageConfig. If no validation dataset is specified, 20% of your training data will be used for validation by default, unless you pass `validation_size` argument with a different value.
 
 For example:
 
diff --git a/articles/machine-learning/reference-automl-images-hyperparameters.md b/articles/machine-learning/reference-automl-images-hyperparameters.md
@@ -68,7 +68,6 @@ The following table describes the hyperparameters that are model agnostic.
 |`beta2` | Value of `beta2` when optimizer is `adam` or `adamw`.<br> Must be a float in the range [0, 1]. | 0.999 |
 |`amsgrad` | Enable `amsgrad` when optimizer is `adam` or `adamw`.<br> Must be 0 or 1. | 0 |
 |`evaluation_frequency`| Frequency to evaluate validation dataset to get metric scores. <br> Must be a positive integer. | 1 |
-|`split_ratio`| If validation data is not defined, this specifies the split ratio for splitting train data into random train and validation subsets. <br> Must be a float in the range [0, 1].| 0.2 |
 |`checkpoint_frequency`| Frequency to store model checkpoints. <br> Must be a positive integer. | Checkpoint at epoch with best primary metric on validation.|
 |`checkpoint_run_id`| The run id of the experiment that has a pretrained checkpoint for incremental training.| no default  |
 |`checkpoint_dataset_id`| FileDataset id containing pretrained checkpoint(s) for incremental training. Make sure to pass `checkpoint_filename` along with `checkpoint_dataset_id`.| no default  |