You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this article, learn how to run your [TensorFlow](https://www.tensorflow.org/overview) training scripts at scale using Azure Machine Learning Python SDK v2.
25
25
26
-
This example code in this article train a TensorFlow model to classify handwritten digits using a deep neural network (DNN), register the model, and deploy it to an online endpoint.
26
+
The example code in this article train a TensorFlow model to classify handwritten digits, using a deep neural network (DNN); register the model; and deploy it to an online endpoint.
27
27
28
28
Whether you're developing a TensorFlow model from the ground-up or you're bringing an existing model into the cloud, you can use Azure Machine Learning to scale out open-source training jobs using elastic cloud compute resources. You can build, deploy, version, and monitor production-grade models with Azure Machine Learning.
29
29
@@ -32,7 +32,7 @@ Whether you're developing a TensorFlow model from the ground-up or you're bringi
32
32
To benefit from this article, you'll need to:
33
33
34
34
- Access an Azure subscription. If you don't have one already, [create a free account](https://azure.microsoft.com/free/).
35
-
- Run the code in this article using either an Azure Machine Learning compute instance, or your own Jupyter notebook.
35
+
- Run the code in this article using either an Azure Machine Learning compute instance or your own Jupyter notebook.
36
36
- Azure Machine Learning compute instance - no downloads or installation necessary
37
37
- Complete the [Quickstart: Get started with Azure Machine Learning](quickstart-create-resources.md) to create a dedicated notebook server pre-loaded with the SDK and the sample repository.
38
38
- In the samples deep learning folder on the notebook server, find a completed and expanded notebook by navigating to this directory: **v2 > sdk > jobs > single-step > tensorflow > train-hyperparameter-tune-deploy-with-tensorflow**.
@@ -63,7 +63,7 @@ If `DefaultAzureCredential` doesn't work for you, see [`azure-identity reference
If you prefer to use a browser to sign in and authenticate, you should remove the comments in the following code and use it instead.
66
+
If you prefer to use a browser to sign in and authenticate, you should uncomment the following code and use it instead.
67
67
68
68
```python
69
69
# Handle to the workspace
@@ -76,7 +76,7 @@ If you prefer to use a browser to sign in and authenticate, you should remove th
76
76
77
77
Next, get a handle to the workspace by providing your Subscription ID, Resource Group name, and workspace name. To find these parameters:
78
78
79
-
1. Look in the upper-right corner of the Azure Machine Learning studio toolbar for your workspace name.
79
+
1. Look for your workspace name in the upper-right corner of the Azure Machine Learning studio toolbar.
80
80
2. Select your workspace name to show your Resource Group and Subscription ID.
81
81
3. Copy the values for Resource Group and Subscription ID into the code.
82
82
@@ -91,7 +91,7 @@ The result of running this script is a workspace handle that you'll use to manag
91
91
92
92
AzureML needs a compute resource to run a job. This resource can be single or multi-node machines with Linux or Windows OS, or a specific compute fabric like Spark.
93
93
94
-
In the following example script, we provision a Linux [`compute cluster`](/azure/machine-learning/how-to-create-attach-compute-cluster?tabs=python). You can see the [`Azure Machine Learning pricing`](https://azure.microsoft.com/pricing/details/machine-learning/) page for the full list of VM sizes and prices. Since we need a GPU cluster for this example, let's pick a *STANDARD_NC6* model and create an Azure ML compute.
94
+
In the following example script, we provision a Linux [`compute cluster`](/azure/machine-learning/how-to-create-attach-compute-cluster?tabs=python). You can see the [`Azure Machine Learning pricing`](https://azure.microsoft.com/pricing/details/machine-learning/) page for the full list of VM sizes and prices. Since we need a GPU cluster for this example, let's pick a *STANDARD_NC6* model and create an AzureML compute.
@@ -130,8 +130,8 @@ During the pipeline run, you'll use MLFlow to log the parameters and metrics. To
130
130
131
131
In the training script `tf_mnist.py`, we create a simple deep neural network (DNN). This DNN has:
132
132
133
-
- An input layer with 28 * 28 = 784 neurons. Each neuron represents an image pixel;
134
-
- Two hidden layers. The first hidden layer has 300 neurons and the second hidden layer has 100 neurons; and
133
+
- An input layer with 28 * 28 = 784 neurons. Each neuron represents an image pixel.
134
+
- Two hidden layers. The first hidden layer has 300 neurons and the second hidden layer has 100 neurons.
135
135
- An output layer with 10 neurons. Each neuron represents a targeted label from 0 to 9.
136
136
137
137
:::image type="content" source="media/how-to-train-tensorflow/neural-network.png" alt-text="Diagram showing a deep neural network with 784 neurons at the input layer, two hidden layers, and 10 neurons at the output layer.":::
@@ -149,8 +149,7 @@ You'll use the general purpose `command` to run the training script and perform
- The inputs for this command include the data location, batch size, number of neurons in the first and second layer, and learning rate.
153
-
- We've passed in the web path directly as an input.
152
+
- The inputs for this command include the data location, batch size, number of neurons in the first and second layer, and learning rate. Notice that we've passed in the web path directly as an input.
154
153
155
154
- For the parameter values:
156
155
- provide the compute cluster `gpu_compute_target = "gpu-cluster"` that you created for running this command;
@@ -162,7 +161,7 @@ You'll use the general purpose `command` to run the training script and perform
162
161
163
162
### Submit the job
164
163
165
-
It's now time to submit the job to run in AzureML. This time you'll use `create_or_update` on `ml_client.jobs`.
164
+
It's now time to submit the job to run in AzureML. This time, you'll use `create_or_update` on `ml_client.jobs`.
@@ -176,7 +175,7 @@ As the job is executed, it goes through the following stages:
176
175
177
176
-**Preparing**: A docker image is created according to the environment defined. The image is uploaded to the workspace's container registry and cached for later runs. Logs are also streamed to the run history and can be viewed to monitor progress. If a curated environment is specified, the cached image backing that curated environment will be used.
178
177
179
-
-**Scaling**: The cluster attempts to scale up if the cluster requires more nodes to execute the run than are currently available.
178
+
-**Scaling**: The cluster attempts to scale up if it requires more nodes to execute the run than are currently available.
180
179
181
180
-**Running**: All scripts in the script folder *src* are uploaded to the compute target, data stores are mounted or copied, and the script is executed. Outputs from *stdout* and the *./logs* folder are streamed to the run history and can be used to monitor the run.
182
181
@@ -225,7 +224,7 @@ For more information about deployment, see [Deploy and score a machine learning
225
224
226
225
### Create a new online endpoint
227
226
228
-
As a first step, you need to create your online endpoint. The endpoint name must be unique in the entire Azure region. For this article, you'll create a unique name using a universally unique identifier (UUID).
227
+
As a first step to deploying your model, you need to create your online endpoint. The endpoint name must be unique in the entire Azure region. For this article, you'll create a unique name using a universally unique identifier (UUID).
@@ -237,14 +236,14 @@ Once you've created the endpoint, you can retrieve it as follows:
237
236
238
237
### Deploy the model to the endpoint
239
238
240
-
After you've created the endpoint, you can deploy the model with the entry script. An endpoint can have multiple deployments. The endpoint can then direct traffic to these deployments, using rules.
239
+
After you've created the endpoint, you can deploy the model with the entry script. An endpoint can have multiple deployments. Using rules, the endpoint can then direct traffic to these deployments.
241
240
242
241
In the following code, you'll create a single deployment that handles 100% of the incoming traffic. We've specified an arbitrary color name (*tff-blue*) for the deployment. You could also use any other name such as *tff-green* or *tff-red* for the deployment.
243
242
The code to deploy the model to the endpoint does the following:
244
243
245
-
-Deploys the best version of the model that you registered earlier;
246
-
-Scores the model, using the `core.py` file; and
247
-
-Uses the same curated environment (that you declared earlier) to perform inferencing.
244
+
-deploys the best version of the model that you registered earlier;
245
+
-scores the model, using the `core.py` file; and
246
+
-uses the same curated environment (that you declared earlier) to perform inferencing.
0 commit comments