acrolinx issues

midesa · midesa · commit 14dba5fa701a · 2024-02-27T16:15:42.000-08:00
diff --git a/articles/synapse-analytics/machine-learning/tutorial-horovod-tensorflow.md b/articles/synapse-analytics/machine-learning/tutorial-horovod-tensorflow.md
@@ -13,7 +13,7 @@ ms.author: midesa
 
 [Horovod](https://github.com/horovod/horovod) is a distributed training framework for libraries like TensorFlow and PyTorch. With Horovod, users can scale up an existing training script to run on hundreds of GPUs in just a few lines of code. 
 
-Within Azure Synapse Analytics, users can quickly get started with Horovod using the default Apache Spark 3 runtime. For Spark ML pipeline applications using TensorFlow, users can use ```HorovodRunner```. This notebook uses an Apache Spark dataframe to perform distributed training of a distributed neural network (DNN) model on MNIST dataset. This tutorial leverages TensorFlow and the ```HorovodRunner``` to run the training process.
+Within Azure Synapse Analytics, users can quickly get started with Horovod using the default Apache Spark 3 runtime. For Spark ML pipeline applications using TensorFlow, users can use ```HorovodRunner```. This notebook uses an Apache Spark dataframe to perform distributed training of a distributed neural network (DNN) model on MNIST dataset. This tutorial uses TensorFlow and the ```HorovodRunner``` to run the training process.
 
 ## Prerequisites
 
@@ -27,9 +27,9 @@ Within Azure Synapse Analytics, users can quickly get started with Horovod using
 
 ## Configure the Apache Spark session
 
-At the start of the session, we will need to configure a few Apache Spark settings. In most cases, we only need to set the ```numExecutors``` and ```spark.rapids.memory.gpu.reserve```. For very large models, users may also need to configure the  ```spark.kryoserializer.buffer.max``` setting. For TensorFlow models, users will need to set the ```spark.executorEnv.TF_FORCE_GPU_ALLOW_GROWTH``` to be true.
+At the start of the session, we need to configure a few Apache Spark settings. In most cases, we only need to set the ```numExecutors``` and ```spark.rapids.memory.gpu.reserve```. For very large models, users may also need to configure the  ```spark.kryoserializer.buffer.max``` setting. For TensorFlow models, users need to set the ```spark.executorEnv.TF_FORCE_GPU_ALLOW_GROWTH``` to be true.
 
-In the example below, you can see how the Spark configurations can be passed with the ```%%configure``` command. The detailed meaning of each parameter is explained in the [Apache Spark configuration documentation](https://spark.apache.org/docs/latest/configuration.html). The values provided below are the suggested, best practice values for Azure Synapse GPU-large pools.  
+In the example, you can see how the Spark configurations can be passed with the ```%%configure``` command. The detailed meaning of each parameter is explained in the [Apache Spark configuration documentation](https://spark.apache.org/docs/latest/configuration.html). The values provided are the suggested, best practice values for Azure Synapse GPU-large pools.  
 
 ```spark
 
@@ -48,7 +48,7 @@ In the example below, you can see how the Spark configurations can be passed wit
 }
 ```
 
-For this tutorial, we will use the following configurations:
+For this tutorial, we use the following configurations:
 
 ```python
 
@@ -67,9 +67,9 @@ For this tutorial, we will use the following configurations:
 
 ## Setup primary storage account
 
-We will need the Azure Data Lake Storage (ADLS) account for storing intermediate and model data. If you are using an alternative storage account, be sure to set up the [linked service](../../data-factory/concepts-linked-services.md) to automatically authenticate and read from the account. 
+We need the Azure Data Lake Storage (ADLS) account for storing intermediate and model data. If you are using an alternative storage account, be sure to set up the [linked service](../../data-factory/concepts-linked-services.md) to automatically authenticate and read from the account. 
 
-In this example, we will read from the primary Azure Synapse Analytics storage account. To do this, you will need to modify the following properties below: ```remote_url```.
+In this example, we read data from the primary Azure Synapse Analytics storage account. To read the results you need to modify the following properties: ```remote_url```.
 
 ```python
 # Specify training parameters
@@ -84,7 +84,7 @@ remote_url = "<<abfss path to storage account>>
 
 ## Prepare dataset
 
-Next, we will prepare the dataset for training. In this tutorial, we will use the MNIST dataset from [Azure Open Datasets](../../open-datasets/dataset-mnist.md?tabs=azureml-opendatasets).
+Next, we prepare the dataset for training. In this tutorial, we use the MNIST dataset from [Azure Open Datasets](../../open-datasets/dataset-mnist.md?tabs=azureml-opendatasets).
 
 ```python
 def get_dataset(rank=0, size=1):
@@ -131,7 +131,7 @@ def get_dataset(rank=0, size=1):
 
 ## Define DNN model
 
-Once we have finished processing our dataset, we can now define our TensorFlow model. The same code could also be used to train a single-node TensorFlow model.
+Once our dataset is processed, we can define our TensorFlow model. The same code could also be used to train a single-node TensorFlow model.
 
 ```python
 # Define the TensorFlow model without any Horovod-specific parameters
@@ -158,7 +158,7 @@ def get_model():
 
 ## Define a training function for a single node
 
-First, we will train our TensorFlow model on the driver node of the Apache Spark pool. Once we have finished the training process, we will evaluate the model and print the loss and accuracy scores.
+First, we train our TensorFlow model on the driver node of the Apache Spark pool. Once the training process is complete, we evaluate the model and print the loss and accuracy scores.
 
 ```python
 
@@ -208,7 +208,7 @@ Next, we will take a look at how the same code could be re-run using ```HorovodR
 
 ### Define training function
 
-To do this, we will first define a training function for ```HorovodRunner```.
+To train a model, we first define a training function for ```HorovodRunner```.
 
 ```python
 # Define training function for Horovod runner
@@ -289,7 +289,7 @@ def train_hvd(learning_rate=0.1):
 
 ### Run training
 
-Once we have defined the model, we will run the training process. 
+Once the model is defined, we can run the training process. 
 
 ```python
 # Run training
@@ -307,7 +307,7 @@ best_model_bytes = \
 
 ### Save checkpoints to ADLS storage
 
-The code below shows how to save the checkpoints to the Azure Data Lake Storage (ADLS) account.
+The code shows how to save the checkpoints to the Azure Data Lake Storage (ADLS) account.
 
 ```python
 import tempfile
@@ -330,7 +330,7 @@ print(adls_ckpt_file)
 
 ### Evaluate Horovod trained model
 
-Once we have finished training our model, we can then take a look at the loss and accuracy for the final model. 
+Once the model training is complete, we can then take a look at the loss and accuracy for the final model. 
 
 ```python
 import tensorflow as tf
diff --git a/articles/synapse-analytics/machine-learning/tutorial-load-data-petastorm.md b/articles/synapse-analytics/machine-learning/tutorial-load-data-petastorm.md
@@ -11,7 +11,7 @@ ms.author: midesa
 
 # Load data with Petastorm (Preview)
 
-Petastorm is an open source data access library which enables single-node or distributed training of deep learning models. This library enables training directly from datasets in Apache Parquet format and datasets that have already been loaded as an Apache Spark DataFrame. Petastorm supports popular training frameworks such as Tensorflow and PyTorch.
+Petastorm is an open source data access library, which enables single-node or distributed training of deep learning models. This library enables training directly from datasets in Apache Parquet format and datasets that are loaded as an Apache Spark DataFrame. Petastorm supports popular training frameworks such as Tensorflow and PyTorch.
 
 For more information about Petastorm, you can visit the [Petastorm GitHub page](https://github.com/uber/petastorm) or the [Petastorm API documentation](https://petastorm.readthedocs.io/en/latest).
 
@@ -28,7 +28,7 @@ For more information about Petastorm, you can visit the [Petastorm GitHub page](
 
 ## Configure the Apache Spark session
 
-At the start of the session, we will need to configure a few Apache Spark settings. In most cases, we only need to set the ```numExecutors``` and ```spark.rapids.memory.gpu.reserve```. In the example below, you can see how the Spark configurations can be passed with the ```%%configure``` command. The detailed meaning of each parameter is explained in the [Apache Spark configuration documentation](https://spark.apache.org/docs/latest/configuration.html).
+At the start of the session, we need to configure a few Apache Spark settings. In most cases, we only need to set the ```numExecutors``` and ```spark.rapids.memory.gpu.reserve```. In the example, you can see how the Spark configurations can be passed with the ```%%configure``` command. The detailed meaning of each parameter is explained in the [Apache Spark configuration documentation](https://spark.apache.org/docs/latest/configuration.html).
 
 ```python
 %%configure -f
@@ -44,7 +44,7 @@ At the start of the session, we will need to configure a few Apache Spark settin
 
 A dataset created using Petastorm is stored in an Apache Parquet format. On top of a Parquet schema, Petastorm also stores higher-level schema information that makes multidimensional arrays into a native part of a Petastorm dataset.
 
-In the sample below, we create a dataset using PySpark. We write the dataset to an Azure Data Lake Storage Gen2 account.
+In the sample, we create a dataset using PySpark. We write the dataset to an Azure Data Lake Storage Gen2 account.
 
 ```python
 import numpy as np
@@ -102,7 +102,7 @@ generate_petastorm_dataset(output_url)
 
 The ```petastorm.reader.Reader``` class is the main entry point for user code that accesses the data from an ML framework such as Tensorflow or Pytorch. You can read a dataset using the ```petastorm.reader.Reader``` class and the ```petastorm.make_reader``` factory method.
 
-In the example below, you can see how you can pass an ```abfs``` URL protocol.
+In the example, you can see how you can pass an ```abfs``` URL protocol.
 
 ```python
 from petastorm import make_reader
@@ -115,7 +115,7 @@ with make_reader('abfs://<container_name>/<data directory path>/') as reader:
 
 ### Read dataset from secondary storage account
 
-If you are using an alternative storage account, be sure to set up the [linked service](../../data-factory/concepts-linked-services.md) to automatically authenticate and read from the account. In addition, you will need to modify the following properties below: ```remote_url```, ```account_name```, and ```linked_service_name```.
+If you are using an alternative storage account, be sure to set up the [linked service](../../data-factory/concepts-linked-services.md) to automatically authenticate and read from the account. In addition, you need to modify the following properties: ```remote_url```, ```account_name```, and ```linked_service_name```.
 
 ```python
 from petastorm import make_reader
@@ -134,7 +134,7 @@ with make_reader('{}/data_directory'.format(remote_url), storage_options = {'sas
 
 ### Read dataset in batches
 
-In the example below, you can see how you can pass an ```abfs``` URL protocol to read data in batches. This example uses the ```make_batch_reader``` class. 
+In the example, you can see how you can pass an ```abfs``` URL protocol to read data in batches. This example uses the ```make_batch_reader``` class. 
 
 ```python
 from petastorm import make_batch_reader
@@ -146,7 +146,7 @@ with make_batch_reader('abfs://<container_name>/<data directory path>/', schema_
 
 ## PyTorch API
 
-To read a Petastorm dataset from PyTorch, you can use the adapter ```petastorm.pytorch.DataLoader``` class. This allows for custom PyTorch collating functions and transforms to be supplied.
+To read a Petastorm dataset from PyTorch, you can use the adapter ```petastorm.pytorch.DataLoader``` class. This adapter allows for custom PyTorch collating functions and transforms to be supplied.
 
 In this example, we will show how Petastorm DataLoader can be used to load a Petastorm dataset with the help of make_reader API. This first section creates the definition of a ```Net``` class and ```train``` and ```test``` function.