You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-train-tensorflow.md
+19-19Lines changed: 19 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.subservice: core
8
8
ms.author: balapv
9
9
author: balapv
10
10
ms.reviewer: mopeakande
11
-
ms.date: 02/23/2022
11
+
ms.date: 10/03/2022
12
12
ms.topic: how-to
13
13
ms.custom: sdkv2, event-tier1-build-2022
14
14
#Customer intent: As a TensorFlow developer, I need to combine open-source with a cloud platform to train, evaluate, and deploy my deep learning models at scale.
@@ -59,7 +59,7 @@ First, you'll need to connect to your AzureML workspace. The [AzureML workspace]
59
59
60
60
We're using `DefaultAzureCredential` to get access to the workspace. This credential should be capable of handling most Azure SDK authentication scenarios.
61
61
62
-
If `DefaultAzureCredential`does not work for you, see [`azure-identity reference documentation`](/python/api/azure-identity/azure.identity) or [`Set up authentication`](how-to-setup-authentication.md?tabs=sdk) for more available credentials.
62
+
If `DefaultAzureCredential`doesn't work for you, see [`azure-identity reference documentation`](/python/api/azure-identity/azure.identity) or [`Set up authentication`](how-to-setup-authentication.md?tabs=sdk) for more available credentials.
For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/).
117
+
For more information about the MNIST dataset, visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/).
118
118
119
119
### Prepare the training script
120
120
@@ -128,17 +128,17 @@ In this article, we've provided the training script *tf_mnist.py*. In practice,
128
128
129
129
During the pipeline run, you'll use MLFlow to log the parameters and metrics. To learn how to enable MLFlow tracking, see [Track ML experiments and models with MLflow](how-to-use-mlflow-cli-runs.md).
130
130
131
-
In the In the training script `tf_mnist.py`, we create a simple deep neural network (DNN). This DNN has:
131
+
In the training script `tf_mnist.py`, we create a simple deep neural network (DNN). This DNN has:
132
132
133
-
-an input layer with 28 * 28 = 784 neurons — each neuron represents an image pixel;
134
-
-two hidden layers — the first hidden layer has 300 neurons and the second hidden layer has 100 neurons; and
135
-
-an output layer with 10 neurons — each neuron represents a targeted label from 0 to 9.
133
+
-An input layer with 28 * 28 = 784 neurons. Each neuron represents an image pixel;
134
+
-Two hidden layers. The first hidden layer has 300 neurons and the second hidden layer has 100 neurons; and
135
+
-An output layer with 10 neurons. Each neuron represents a targeted label from 0 to 9.
136
136
137
137
:::image type="content" source="media/how-to-train-tensorflow/neural_network.png" alt-text="Diagram showing a deep neural network with 784 neurons at the input layer, two hidden layers, and 10 neurons at the output layer.":::
138
138
139
139
### Build the training job
140
140
141
-
Now that you have all the assets required to run your job, it's time to build it using the AzureML Python SDK v2. For this, we'll be creating a `command`.
141
+
Now that you have all the assets required to run your job, it's time to build it using the AzureML Python SDK v2. For this example, we'll be creating a `command`.
142
142
143
143
An AzureML `command` is a resource that specifies all the details needed to execute your training code in the cloud. These details include the inputs and outputs, type of hardware to use, software to install, and how to run your code. The `command` contains information to execute a single command.
144
144
@@ -150,15 +150,15 @@ You'll use the general purpose `command` to run the training script and perform
- The inputs for this command include the data location, batch size, number of neurons in the first and second layer, and learning rate.
153
-
-Note that we've passed in the webpath directly as an input.
153
+
-We've passed in the web path directly as an input.
154
154
155
155
- For the parameter values:
156
156
- provide the compute cluster `gpu_compute_target = "gpu-cluster"` that you created for running this command;
157
157
- provide the curated environment `curated_env_name` that you declared earlier;
158
158
- configure the command line action itself—in this case, the command is `python tf_mnist.py`. You can access the inputs and outputs in the command via the `${{ ... }}` notation; and
159
-
- configure metadata such as the display name and experiment name; where an experiment is a container for all the iterations one does on a certain project. Note that all the jobs submitted under the same experiment name would be listed next to each other in AzureML studio.
159
+
- configure metadata such as the display name and experiment name; where an experiment is a container for all the iterations one does on a certain project. All the jobs submitted under the same experiment name would be listed next to each other in AzureML studio.
160
160
161
-
- In this example, you'll use the `UserIdentity` to run the command. This means that the command will use your identity to run the job and access the data from the blob.
161
+
- In this example, you'll use the `UserIdentity` to run the command. Using a user identity means that the command will use your identity to run the job and access the data from the blob.
162
162
163
163
### Submit the job
164
164
@@ -182,15 +182,15 @@ As the job is executed, it goes through the following stages:
182
182
183
183
## Tune model hyperparameters
184
184
185
-
Now that you've seen how to do a simple TensorFlow training run using the SDK, let's see if you can further improve the accuracy of your model. You can tune and optimize your model's hyperparameters using Azure Machine Learning's [`sweep`](/python/api/azure-ai-ml/azure.ai.ml.sweep) capabilities.
185
+
Now that you've seen how to do a TensorFlow training run using the SDK, let's see if you can further improve the accuracy of your model. You can tune and optimize your model's hyperparameters using Azure Machine Learning's [`sweep`](/python/api/azure-ai-ml/azure.ai.ml.sweep) capabilities.
186
186
187
187
To tune the model's hyperparameters, define the parameter space in which to search during training. You'll do this by replacing some of the parameters (`batch_size`, `first_layer_neurons`, `second_layer_neurons`, and `learning_rate`) passed to the training job with special inputs from the `azure.ml.sweep` package.
Then, you'll configure sweep on the command job, using some sweep-specific parameters, such as the primary metric to watch and the sampling algorithm to use.
192
192
193
-
In the following code we use random sampling to try different configuration sets of hyperparameters in an attempt to maximize our primary metric, `validation_acc`.
193
+
In the following code, we use random sampling to try different configuration sets of hyperparameters in an attempt to maximize our primary metric, `validation_acc`.
194
194
195
195
We also define an early termination policy—the `BanditPolicy`. This policy operates by checking the job every two iterations. If the primary metric, `validation_acc`, falls outside the top ten percent range, AzureML will terminate the job. This saves the model from continuing to explore hyperparameters that show no promise of helping to reach the target metric.
196
196
@@ -219,7 +219,7 @@ After you've registered your model, you can deploy it as an [online endpoint](co
219
219
220
220
To deploy a machine learning service, you'll typically need:
221
221
- The model assets that you want to deploy. These assets include the model's file and metadata that you already registered in your training job.
222
-
- Some code to run as a service. The code executes the model on a given input request (an entry script). This entry script receives data submitted to a deployed web service and passes it to the model. After the model processes the data, the script returns the model's response to the client. The script is specific to your model and must understand the data that the model expects and returns. When using an MLFlow model, AzureML automatically creates this script for you.
222
+
- Some code to run as a service. The code executes the model on a given input request (an entry script). This entry script receives data submitted to a deployed web service and passes it to the model. After the model processes the data, the script returns the model's response to the client. The script is specific to your model and must understand the data that the model expects and returns. When you use an MLFlow model, AzureML automatically creates this script for you.
223
223
224
224
For more information about deployment, see [Deploy and score a machine learning model with managed online endpoint using Python SDK v2](how-to-deploy-managed-online-endpoint-sdk-v2.md).
225
225
@@ -237,14 +237,14 @@ Once you've created the endpoint, you can retrieve it as follows:
237
237
238
238
### Deploy the model to the endpoint
239
239
240
-
After you've created the endpoint, you can deploy the model with the entry script. Note that an endpoint can have multiple deployments. The endpoint can then direct traffic to these deployments, using rules.
240
+
After you've created the endpoint, you can deploy the model with the entry script. An endpoint can have multiple deployments. The endpoint can then direct traffic to these deployments, using rules.
241
241
242
-
In the following code, you'll create a single deployment that handles 100% of the incoming traffic. We've specified an arbitrary color name (*tff-blue*) for the deployment. You could just as well use any other name such as *tff-green* or *tff-red* for the deployment.
242
+
In the following code, you'll create a single deployment that handles 100% of the incoming traffic. We've specified an arbitrary color name (*tff-blue*) for the deployment. You could also use any other name such as *tff-green* or *tff-red* for the deployment.
243
243
The code to deploy the model to the endpoint does the following:
244
244
245
-
-deploys the best version of the model that you registered earlier;
246
-
-scores the model, using the `core.py` file; and
247
-
-uses the same curated environment (that you declared earlier) to perform inferencing.
245
+
-Deploys the best version of the model that you registered earlier;
246
+
-Scores the model, using the `core.py` file; and
247
+
-Uses the same curated environment (that you declared earlier) to perform inferencing.
0 commit comments