Merge pull request #264767 from ssalgadodev/patch-66

prmerger-automator[bot] · web-flow · commit 12e0c9ad1176 · 2024-01-31T06:39:01.000Z
Update how-to-auto-train-models-v1.md
diff --git a/articles/machine-learning/v1/how-to-auto-train-models-v1.md b/articles/machine-learning/v1/how-to-auto-train-models-v1.md
@@ -9,7 +9,7 @@ ms.topic: how-to
 author: manashgoswami 
 ms.author: magoswam
 ms.reviewer: ssalgado 
-ms.date: 11/04/2022
+ms.date: 01/25/2023
 ms.custom: UpdateFrequency5, devx-track-python, automl, FY21Q4-aml-seo-hack, contperf-fy21q4, sdkv1, event-tier1-build-2022
 ---
 
@@ -23,7 +23,7 @@ This process accepts training data and configuration settings, and automatically
 
 ![Flow diagram](./media/how-to-auto-train-models/flow2.png)
 
-You'll write code using the Python SDK in this article.  You'll learn the following tasks:
+You write code using the Python SDK in this article.  You learn the following tasks:
 
 > [!div class="checklist"]
 > * Download, transform, and clean data using Azure Open Datasets
@@ -63,7 +63,7 @@ from datetime import datetime
 from dateutil.relativedelta import relativedelta
 ```
 
-Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets only allows downloading one month of data at a time with certain classes to avoid `MemoryError` with large datasets.
+Begin by creating a dataframe to hold the taxi data. When you work in a non-Spark environment, Open Datasets only allows downloading one month of data at a time with certain classes to avoid `MemoryError` with large datasets.
 
 To download taxi data, iteratively fetch one month at a time, and before appending it to `green_taxi_df` randomly sample 2,000 records from each month to avoid bloating the dataframe. Then preview the data.
 
@@ -94,7 +94,7 @@ green_taxi_df.head(10)
 |150436|2|2015-01-11 17:15:14|2015-01-11 17:22:57|1|1.19|None|None|-73.94|40.71|-73.95|...|1|7.00|0.00|0.50|0.3|1.75|0.00|nan|9.55|
 |432136|2|2015-01-22 23:16:33    2015-01-22 23:20:13    1    0.65|None|None|-73.94|40.71|-73.94|...|2|5.00|0.50|0.50|0.3|0.00|0.00|nan|6.30|
 
-Remove some of the columns that you won't need for training or additional feature building.  Automate machine learning will automatically handle time-based features such as **lpepPickupDatetime**.
+Remove some of the columns that you won't need for training or other feature building.  Automate machine learning will automatically handle time-based features such as **lpepPickupDatetime**.
 
 ```python
 columns_to_remove = ["lpepDropoffDatetime", "puLocationId", "doLocationId", "extra", "mtaTax",
@@ -127,7 +127,7 @@ green_taxi_df.describe()
 |max|2.00|9.00|97.57|0.00|41.93|0.00|41.94|450.00|12.00|30.00|
 
 
-From the summary statistics, you see that there are several fields that have outliers or values that will reduce model accuracy. First filter the lat/long fields to be within the bounds of the Manhattan area. This will filter out longer taxi trips or trips that are outliers in respect to their relationship with other features.
+From the summary statistics, you see that there are several fields that have outliers or values that reduce model accuracy. First filter the lat/long fields to be within the bounds of the Manhattan area. This filters out longer taxi trips or trips that are outliers in respect to their relationship with other features.
 
 Additionally filter the `tripDistance` field to be greater than zero but less than 31 miles (the haversine distance between the two lat/long pairs). This eliminates long outlier trips that have inconsistent trip cost.
 
@@ -186,17 +186,17 @@ To automatically train a model, take the following steps:
 
 ### Define training settings
 
-Define the experiment parameter and model settings for training. View the full list of [settings](how-to-configure-auto-train.md). Submitting the experiment with these default settings will take approximately 5-20 min, but if you want a shorter run time, reduce the `experiment_timeout_hours` parameter.
+Define the experiment parameter and model settings for training. View the full list of [settings](how-to-configure-auto-train.md). Submitting the experiment with these default settings take approximately 5-20 min, but if you want a shorter run time, reduce the `experiment_timeout_hours` parameter.
 
 |Property| Value in this article |Description|
 |----|----|---|
 |**iteration_timeout_minutes**|10|Time limit in minutes for each iteration. Increase this value for larger datasets that need more time for each iteration.|
 |**experiment_timeout_hours**|0.3|Maximum amount of time in hours that all iterations combined can take before the experiment terminates.|
-|**enable_early_stopping**|True|Flag to enable early termination if the score is not improving in the short term.|
-|**primary_metric**| spearman_correlation | Metric that you want to optimize. The best-fit model will be chosen based on this metric.|
+|**enable_early_stopping**|True|Flag to enable early termination if the score isn't improving in the short term.|
+|**primary_metric**| spearman_correlation | Metric that you want to optimize. The best-fit model is chosen based on this metric.|
 |**featurization**| auto | By using **auto**, the experiment can preprocess the input data (handling missing data, converting text to numeric, etc.)|
 |**verbosity**| logging.INFO | Controls the level of logging.|
-|**n_cross_validations**|5|Number of cross-validation splits to perform when validation data is not specified.|
+|**n_cross_validations**|5|Number of cross-validation splits to perform when validation data isn't specified.|
 
 ```python
 import logging
@@ -363,7 +363,7 @@ The traditional machine learning model development process is highly resource-in
 
 ## Clean up resources
 
-Do not complete this section if you plan on running other Azure Machine Learning tutorials.
+Don't complete this section if you plan on running other Azure Machine Learning tutorials.
 
 ### Stop the compute instance