Skip to content

Commit 14dba5f

Browse files
committed
acrolinx issues
1 parent 36c927f commit 14dba5f

File tree

2 files changed

+20
-20
lines changed

2 files changed

+20
-20
lines changed

articles/synapse-analytics/machine-learning/tutorial-horovod-tensorflow.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.author: midesa
1313

1414
[Horovod](https://github.com/horovod/horovod) is a distributed training framework for libraries like TensorFlow and PyTorch. With Horovod, users can scale up an existing training script to run on hundreds of GPUs in just a few lines of code.
1515

16-
Within Azure Synapse Analytics, users can quickly get started with Horovod using the default Apache Spark 3 runtime. For Spark ML pipeline applications using TensorFlow, users can use ```HorovodRunner```. This notebook uses an Apache Spark dataframe to perform distributed training of a distributed neural network (DNN) model on MNIST dataset. This tutorial leverages TensorFlow and the ```HorovodRunner``` to run the training process.
16+
Within Azure Synapse Analytics, users can quickly get started with Horovod using the default Apache Spark 3 runtime. For Spark ML pipeline applications using TensorFlow, users can use ```HorovodRunner```. This notebook uses an Apache Spark dataframe to perform distributed training of a distributed neural network (DNN) model on MNIST dataset. This tutorial uses TensorFlow and the ```HorovodRunner``` to run the training process.
1717

1818
## Prerequisites
1919

@@ -27,9 +27,9 @@ Within Azure Synapse Analytics, users can quickly get started with Horovod using
2727

2828
## Configure the Apache Spark session
2929

30-
At the start of the session, we will need to configure a few Apache Spark settings. In most cases, we only need to set the ```numExecutors``` and ```spark.rapids.memory.gpu.reserve```. For very large models, users may also need to configure the ```spark.kryoserializer.buffer.max``` setting. For TensorFlow models, users will need to set the ```spark.executorEnv.TF_FORCE_GPU_ALLOW_GROWTH``` to be true.
30+
At the start of the session, we need to configure a few Apache Spark settings. In most cases, we only need to set the ```numExecutors``` and ```spark.rapids.memory.gpu.reserve```. For very large models, users may also need to configure the ```spark.kryoserializer.buffer.max``` setting. For TensorFlow models, users need to set the ```spark.executorEnv.TF_FORCE_GPU_ALLOW_GROWTH``` to be true.
3131

32-
In the example below, you can see how the Spark configurations can be passed with the ```%%configure``` command. The detailed meaning of each parameter is explained in the [Apache Spark configuration documentation](https://spark.apache.org/docs/latest/configuration.html). The values provided below are the suggested, best practice values for Azure Synapse GPU-large pools.
32+
In the example, you can see how the Spark configurations can be passed with the ```%%configure``` command. The detailed meaning of each parameter is explained in the [Apache Spark configuration documentation](https://spark.apache.org/docs/latest/configuration.html). The values provided are the suggested, best practice values for Azure Synapse GPU-large pools.
3333

3434
```spark
3535
@@ -48,7 +48,7 @@ In the example below, you can see how the Spark configurations can be passed wit
4848
}
4949
```
5050

51-
For this tutorial, we will use the following configurations:
51+
For this tutorial, we use the following configurations:
5252

5353
```python
5454

@@ -67,9 +67,9 @@ For this tutorial, we will use the following configurations:
6767
6868
## Setup primary storage account
6969

70-
We will need the Azure Data Lake Storage (ADLS) account for storing intermediate and model data. If you are using an alternative storage account, be sure to set up the [linked service](../../data-factory/concepts-linked-services.md) to automatically authenticate and read from the account.
70+
We need the Azure Data Lake Storage (ADLS) account for storing intermediate and model data. If you are using an alternative storage account, be sure to set up the [linked service](../../data-factory/concepts-linked-services.md) to automatically authenticate and read from the account.
7171

72-
In this example, we will read from the primary Azure Synapse Analytics storage account. To do this, you will need to modify the following properties below: ```remote_url```.
72+
In this example, we read data from the primary Azure Synapse Analytics storage account. To read the results you need to modify the following properties: ```remote_url```.
7373

7474
```python
7575
# Specify training parameters
@@ -84,7 +84,7 @@ remote_url = "<<abfss path to storage account>>
8484

8585
## Prepare dataset
8686

87-
Next, we will prepare the dataset for training. In this tutorial, we will use the MNIST dataset from [Azure Open Datasets](../../open-datasets/dataset-mnist.md?tabs=azureml-opendatasets).
87+
Next, we prepare the dataset for training. In this tutorial, we use the MNIST dataset from [Azure Open Datasets](../../open-datasets/dataset-mnist.md?tabs=azureml-opendatasets).
8888

8989
```python
9090
def get_dataset(rank=0, size=1):
@@ -131,7 +131,7 @@ def get_dataset(rank=0, size=1):
131131

132132
## Define DNN model
133133

134-
Once we have finished processing our dataset, we can now define our TensorFlow model. The same code could also be used to train a single-node TensorFlow model.
134+
Once our dataset is processed, we can define our TensorFlow model. The same code could also be used to train a single-node TensorFlow model.
135135

136136
```python
137137
# Define the TensorFlow model without any Horovod-specific parameters
@@ -158,7 +158,7 @@ def get_model():
158158

159159
## Define a training function for a single node
160160

161-
First, we will train our TensorFlow model on the driver node of the Apache Spark pool. Once we have finished the training process, we will evaluate the model and print the loss and accuracy scores.
161+
First, we train our TensorFlow model on the driver node of the Apache Spark pool. Once the training process is complete, we evaluate the model and print the loss and accuracy scores.
162162

163163
```python
164164

@@ -208,7 +208,7 @@ Next, we will take a look at how the same code could be re-run using ```HorovodR
208208

209209
### Define training function
210210

211-
To do this, we will first define a training function for ```HorovodRunner```.
211+
To train a model, we first define a training function for ```HorovodRunner```.
212212

213213
```python
214214
# Define training function for Horovod runner
@@ -289,7 +289,7 @@ def train_hvd(learning_rate=0.1):
289289

290290
### Run training
291291

292-
Once we have defined the model, we will run the training process.
292+
Once the model is defined, we can run the training process.
293293

294294
```python
295295
# Run training
@@ -307,7 +307,7 @@ best_model_bytes = \
307307

308308
### Save checkpoints to ADLS storage
309309

310-
The code below shows how to save the checkpoints to the Azure Data Lake Storage (ADLS) account.
310+
The code shows how to save the checkpoints to the Azure Data Lake Storage (ADLS) account.
311311

312312
```python
313313
import tempfile
@@ -330,7 +330,7 @@ print(adls_ckpt_file)
330330

331331
### Evaluate Horovod trained model
332332

333-
Once we have finished training our model, we can then take a look at the loss and accuracy for the final model.
333+
Once the model training is complete, we can then take a look at the loss and accuracy for the final model.
334334

335335
```python
336336
import tensorflow as tf

articles/synapse-analytics/machine-learning/tutorial-load-data-petastorm.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.author: midesa
1111

1212
# Load data with Petastorm (Preview)
1313

14-
Petastorm is an open source data access library which enables single-node or distributed training of deep learning models. This library enables training directly from datasets in Apache Parquet format and datasets that have already been loaded as an Apache Spark DataFrame. Petastorm supports popular training frameworks such as Tensorflow and PyTorch.
14+
Petastorm is an open source data access library, which enables single-node or distributed training of deep learning models. This library enables training directly from datasets in Apache Parquet format and datasets that are loaded as an Apache Spark DataFrame. Petastorm supports popular training frameworks such as Tensorflow and PyTorch.
1515

1616
For more information about Petastorm, you can visit the [Petastorm GitHub page](https://github.com/uber/petastorm) or the [Petastorm API documentation](https://petastorm.readthedocs.io/en/latest).
1717

@@ -28,7 +28,7 @@ For more information about Petastorm, you can visit the [Petastorm GitHub page](
2828

2929
## Configure the Apache Spark session
3030

31-
At the start of the session, we will need to configure a few Apache Spark settings. In most cases, we only need to set the ```numExecutors``` and ```spark.rapids.memory.gpu.reserve```. In the example below, you can see how the Spark configurations can be passed with the ```%%configure``` command. The detailed meaning of each parameter is explained in the [Apache Spark configuration documentation](https://spark.apache.org/docs/latest/configuration.html).
31+
At the start of the session, we need to configure a few Apache Spark settings. In most cases, we only need to set the ```numExecutors``` and ```spark.rapids.memory.gpu.reserve```. In the example, you can see how the Spark configurations can be passed with the ```%%configure``` command. The detailed meaning of each parameter is explained in the [Apache Spark configuration documentation](https://spark.apache.org/docs/latest/configuration.html).
3232

3333
```python
3434
%%configure -f
@@ -44,7 +44,7 @@ At the start of the session, we will need to configure a few Apache Spark settin
4444

4545
A dataset created using Petastorm is stored in an Apache Parquet format. On top of a Parquet schema, Petastorm also stores higher-level schema information that makes multidimensional arrays into a native part of a Petastorm dataset.
4646

47-
In the sample below, we create a dataset using PySpark. We write the dataset to an Azure Data Lake Storage Gen2 account.
47+
In the sample, we create a dataset using PySpark. We write the dataset to an Azure Data Lake Storage Gen2 account.
4848

4949
```python
5050
import numpy as np
@@ -102,7 +102,7 @@ generate_petastorm_dataset(output_url)
102102

103103
The ```petastorm.reader.Reader``` class is the main entry point for user code that accesses the data from an ML framework such as Tensorflow or Pytorch. You can read a dataset using the ```petastorm.reader.Reader``` class and the ```petastorm.make_reader``` factory method.
104104

105-
In the example below, you can see how you can pass an ```abfs``` URL protocol.
105+
In the example, you can see how you can pass an ```abfs``` URL protocol.
106106

107107
```python
108108
from petastorm import make_reader
@@ -115,7 +115,7 @@ with make_reader('abfs://<container_name>/<data directory path>/') as reader:
115115

116116
### Read dataset from secondary storage account
117117

118-
If you are using an alternative storage account, be sure to set up the [linked service](../../data-factory/concepts-linked-services.md) to automatically authenticate and read from the account. In addition, you will need to modify the following properties below: ```remote_url```, ```account_name```, and ```linked_service_name```.
118+
If you are using an alternative storage account, be sure to set up the [linked service](../../data-factory/concepts-linked-services.md) to automatically authenticate and read from the account. In addition, you need to modify the following properties: ```remote_url```, ```account_name```, and ```linked_service_name```.
119119

120120
```python
121121
from petastorm import make_reader
@@ -134,7 +134,7 @@ with make_reader('{}/data_directory'.format(remote_url), storage_options = {'sas
134134

135135
### Read dataset in batches
136136

137-
In the example below, you can see how you can pass an ```abfs``` URL protocol to read data in batches. This example uses the ```make_batch_reader``` class.
137+
In the example, you can see how you can pass an ```abfs``` URL protocol to read data in batches. This example uses the ```make_batch_reader``` class.
138138

139139
```python
140140
from petastorm import make_batch_reader
@@ -146,7 +146,7 @@ with make_batch_reader('abfs://<container_name>/<data directory path>/', schema_
146146

147147
## PyTorch API
148148

149-
To read a Petastorm dataset from PyTorch, you can use the adapter ```petastorm.pytorch.DataLoader``` class. This allows for custom PyTorch collating functions and transforms to be supplied.
149+
To read a Petastorm dataset from PyTorch, you can use the adapter ```petastorm.pytorch.DataLoader``` class. This adapter allows for custom PyTorch collating functions and transforms to be supplied.
150150

151151
In this example, we will show how Petastorm DataLoader can be used to load a Petastorm dataset with the help of make_reader API. This first section creates the definition of a ```Net``` class and ```train``` and ```test``` function.
152152

0 commit comments

Comments
 (0)