MicrosoftDocs
diff --git a/‎articles/synapse-analytics/machine-learning/access-data-from-aml.md
Lines changed: 89 additions & 0 deletions b/‎articles/synapse-analytics/machine-learning/access-data-from-aml.md
Lines changed: 89 additions & 0 deletions
diff --git a/‎articles/synapse-analytics/machine-learning/concept-deep-learning.md
Lines changed: 7 additions & 2 deletions b/‎articles/synapse-analytics/machine-learning/concept-deep-learning.md
Lines changed: 7 additions & 2 deletions
diff --git a/‎articles/synapse-analytics/machine-learning/media/tutorial-access-data-from-aml/data-actions.png
416 KB b/‎articles/synapse-analytics/machine-learning/media/tutorial-access-data-from-aml/data-actions.png
416 KB
diff --git a/‎articles/synapse-analytics/machine-learning/tutorial-horovod-pytorch.md
Lines changed: 14 additions & 8 deletions b/‎articles/synapse-analytics/machine-learning/tutorial-horovod-pytorch.md
Lines changed: 14 additions & 8 deletions
@@ -0,0 +1,89 @@
+---
+title: Access ADLSg2 data from Azure Machine Learning
+description: This article provides an overview on how you can access data in your Azure Data Lake Storage Gen 2 (ADLSg2) account directly from Azure Machine Learning. 
+author: midesa
+ms.service: synapse-analytics
+ms.topic: tutorial
+ms.subservice: machine-learning
+ms.date: 02/27/2024
+ms.author: midesa
+---
+
+# Tutorial: Accessing Azure Synapse ADLS Gen2 Data in Azure Machine Learning
+
+In this tutorial, we'll guide you through the process of accessing data stored in Azure Synapse Azure Data Lake Storage Gen2 (ADLS Gen2) from Azure Machine Learning (Azure Machine Learning). This capability is especially valuable when you aim to streamline your machine learning workflow by leveraging tools such as Automated ML, integrated model and experiment tracking, or specialized hardware like GPUs available in Azure Machine Learning.
+
+To access ADLS Gen2 data in Azure Machine Learning, we will create an Azure Machine Learning Datastore that points to the Azure Synapse ADLS Gen2 storage account.
+
+## Prerequisites
+- An [Azure Synapse Analytics workspace](../get-started-create-workspace.md). Ensure that it has an Azure Data Lake Storage Gen2 storage account configured as the default storage. For the Data Lake Storage Gen2 file system that you work with, ensure that you're the *Storage Blob Data Contributor*.
+- An [Azure Machine Learning workspace](../../machine-learning/quickstart-create-resources.md).
+
+## Install libraries
+
+First, we will install the ```azure-ai-ml``` package. 
+
+```python
+%pip install azure-ai-ml
+
+```
+
+## Create a Datastore
+
+Azure Machine Learning offers a feature known as a Datastore, which acts as a reference to your existing Azure storage account. We will create a Datastore which references our Azure Synapse ADLS Gen2 storage account.
+
+In this example, we'll create a Datastore linking to our Azure Synapse ADLS Gen2 storage. After initializing an ```MLClient``` object, you can provide connection details to your ADLS Gen2 account. Finally, you can execute the code to create or update the Datastore.
+
+```python
+from azure.ai.ml.entities import AzureDataLakeGen2Datastore
+from azure.ai.ml import MLClient
+
+ml_client = MLClient.from_config()
+
+# Provide the connection details to your Azure Synapse ADLSg2 storage account
+store = AzureDataLakeGen2Datastore(
+    name="",
+    description="",
+    account_name="",
+    filesystem=""
+)
+
+ml_client.create_or_update(store)
+```
+
+You can learn more about creating and managing Azure Machine Learning datastores using this [tutorial on Azure Machine Learning data stores](../../machine-learning/concept-data.md).
+
+## Mount your ADLS Gen2 Storage Account
+
+Once you have set up your data store, you can then access this data by creating a **mount** to your ADLSg2 account. In Azure Machine Learning, creating a mount to your ADLS Gen2 account entails establishing a direct link between your workspace and the storage account, enabling seamless access to the data stored within. Essentially, a mount acts as a pathway that allows Azure Machine Learning to interact with the files and folders in your ADLS Gen2 account as if they were part of the local filesystem within your workspace. 
+
+Once the storage account is mounted, you can effortlessly read, write, and manipulate data stored in ADLS Gen2 using familiar filesystem operations directly within your Azure Machine Learning environment, simplifying data preprocessing, model training, and experimentation tasks.
+
+To do this:
+
+1. Start your compute engine.
+2. Select **Data Actions** and then select **Mount**.
+
+    ![Screenshot of Azure Machine Learning option to select data actions.](./media/./tutorial-access-data-from-aml/data-actions.png)
+
+1. From here, you should see and select your ADLSg2 storage account name. It may take a few moments for your mount to be created.
+1. Once your mount is ready, you can select **Data actions** and then **Consume**. Under **Data**, you can then select the mount that you want to consume data from.
+
+Now, you can use your preferred libraries to directly read data from your mounted Azure Data Lake Storage account.
+
+## Read data from your storage account
+
+```python
+import os
+# List the files in the mounted path
+print(os.listdir("/home/azureuser/cloudfiles/data/datastore/{name of mount}"))
+
+# Get the path of your file and load the data using your preferred libraries
+import pandas as pd
+df = pd.read_csv("/home/azureuser/cloudfiles/data/datastore/{name of mount}/{file name}")
+print(df.head(5))
+```
+
+## Next steps
+- [Create and manage GPUs in Azure Machine Learning](../../machine-learning/how-to-train-distributed-gpu.md)
+- [Create Automated ML jobs in Azure Machine Learning](../../machine-learning/concept-automated-ml.md)
@@ -5,21 +5,26 @@ author: midesa
 ms.service: synapse-analytics
 ms.topic: conceptual
 ms.subservice: machine-learning
-ms.date: 04/19/2022
+ms.date: 02/27/2024
 ms.author: midesa
 ---
 
 # Deep learning (Preview)
 
 Apache Spark in Azure Synapse Analytics enables machine learning with big data, providing the ability to obtain valuable insight from large amounts of structured, unstructured, and fast-moving data. There are several options when training machine learning models using Azure Spark in Azure Synapse Analytics: Apache Spark MLlib, Azure Machine Learning, and various other open-source libraries.
 
+> [!WARNING]
+> - The GPU accelerated preview is limited to the [Azure Synapse 3.1 (unsupported)](../spark/apache-spark-3-runtime.md) and [Apache Spark 3.2 (EOLA)](../spark/apache-spark-32-runtime.md) runtimes.
+> - Azure Synapse Runtime for Apache Spark 3.1 has reached its end of life (EOL) as of January 26, 2023, with official support discontinued effective January 26, 2024, and no further addressing of support tickets, bug fixes, or security updates beyond this date.
+> - Azure Synapse Runtime for Apache Spark 3.2 has reached its end of life (EOL) as of July 8, 2023, with no further bug or feature fixes, but security fixes may be backported based on risk assessment, and it will be retired and disabled as of July 8, 2024.
+
 ## GPU-enabled Apache Spark pools
 
 To simplify the process for creating and managing pools, Azure Synapse takes care of pre-installing low-level libraries and setting up all the complex networking requirements between compute nodes. This integration allows users to get started with GPU- accelerated pools within just a few minutes. To learn more about how to create a GPU-accelerated pool, you can visit the quickstart on how to [create a GPU-accelerated pool](../quickstart-create-apache-gpu-pool-portal.md).
 
 > [!NOTE]
 >  - GPU-accelerated pools can be created in workspaces located in East US, Australia East, and North Europe.
->  - GPU-accelerated pools are only available with the Apache Spark 3.1 and 3.2 runtime.
+>  - GPU-accelerated pools are only available with the Apache Spark 3.1 (unsupported) and 3.2 runtime.
 >  - You might need to request a [limit increase](../spark/apache-spark-rapids-gpu.md#quotas-and-resource-constraints-in-azure-synapse-gpu-enabled-pools) in order to create GPU-enabled clusters.
 
 ## GPU ML Environment
 
@@ -4,7 +4,7 @@ description: Tutorial on how to run distributed training with the Horovod Estima
 ms.service: synapse-analytics 
 ms.subservice: machine-learning
 ms.topic: tutorial
-ms.date: 04/19/2022
+ms.date: 02/27/2024
 author: midesa
 ms.author: midesa
 ---
@@ -13,18 +13,24 @@ ms.author: midesa
 
 [Horovod](https://github.com/horovod/horovod) is a distributed training framework for libraries like TensorFlow and PyTorch. With Horovod, users can scale up an existing training script to run on hundreds of GPUs in just a few lines of code.
 
-Within Azure Synapse Analytics, users can quickly get started with Horovod using the default Apache Spark 3 runtime.For Spark ML pipeline applications using PyTorch, users can use the horovod.spark estimator API. This notebook uses an Apache Spark dataframe to perform distributed training of a distributed neural network (DNN) model on MNIST dataset. This tutorial leverages PyTorch and the Horovod Estimator to run the training process.
+Within Azure Synapse Analytics, users can quickly get started with Horovod using the default Apache Spark 3 runtime. For Spark ML pipeline applications using PyTorch, users can use the horovod.spark estimator API. This notebook uses an Apache Spark dataframe to perform distributed training of a distributed neural network (DNN) model on MNIST dataset. This tutorial uses PyTorch and the Horovod Estimator to run the training process.
 
 ## Prerequisites
 
 - [Azure Synapse Analytics workspace](../get-started-create-workspace.md) with an Azure Data Lake Storage Gen2 storage account configured as the default storage. You need to be the *Storage Blob Data Contributor* of the Data Lake Storage Gen2 file system that you work with.
 - Create a GPU-enabled Apache Spark pool in your Azure Synapse Analytics workspace. For details, see [Create a GPU-enabled Apache Spark pool in Azure Synapse](../spark/apache-spark-gpu-concept.md). For this tutorial, we suggest using the GPU-Large cluster size with 3 nodes.
 
+> [!WARNING]
+> - The GPU accelerated preview is limited to the [Azure Synapse 3.1 (unsupported)](../spark/apache-spark-3-runtime.md) and [Apache Spark 3.2 (EOLA)](../spark/apache-spark-32-runtime.md) runtimes.
+> - Azure Synapse Runtime for Apache Spark 3.1 has reached its end of life (EOL) as of January 26, 2023, with official support discontinued effective January 26, 2024, and no further addressing of support tickets, bug fixes, or security updates beyond this date.
+> - Azure Synapse Runtime for Apache Spark 3.2 has reached its end of life (EOL) as of July 8, 2023, with no further bug or feature fixes, but security fixes may be backported based on risk assessment, and it will be retired and disabled as of July 8, 2024.
+
+
 ## Configure the Apache Spark session
 
-At the start of the session, we will need to configure a few Apache Spark settings. In most cases, we only needs to set the numExecutors and spark.rapids.memory.gpu.reserve. For very large models, users may also need to configure the  ```spark.kryoserializer.buffer.max``` setting. For Tensorflow models, users will need to set the ```spark.executorEnv.TF_FORCE_GPU_ALLOW_GROWTH``` to be true.
+At the start of the session, we need to configure a few Apache Spark settings. In most cases, we only need to set the numExecutors and spark.rapids.memory.gpu.reserve. For large models, users may also need to configure the  ```spark.kryoserializer.buffer.max``` setting. For Tensorflow models, users need to set the ```spark.executorEnv.TF_FORCE_GPU_ALLOW_GROWTH``` to be true.
 
-In the example below, you can see how the Spark configurations can be passed with the ```%%configure``` command. The detailed meaning of each parameter is explained in the [Apache Spark configuration documentation](https://spark.apache.org/docs/latest/configuration.html). The values provided below are the suggested, best practice values for Azure Synapse GPU-large pools.  
+In the example, you can see how the Spark configurations can be passed with the ```%%configure``` command. The detailed meaning of each parameter is explained in the [Apache Spark configuration documentation](https://spark.apache.org/docs/latest/configuration.html). The values provided are the suggested, best practice values for Azure Synapse GPU-large pools.  
 
 ```spark
 
@@ -61,7 +67,7 @@ For this tutorial, we will use the following configurations:
 
 ## Import dependencies
 
-In this tutorial, we will leverage PySpark to read and process the dataset. We will then use PyTorch and Horovod to build the distributed neural network (DNN) model and run the training process. To get started, we will need to import the following dependencies:
+In this tutorial, we use PySpark to read and process the dataset. Then, we use PyTorch and Horovod to build the distributed neural network (DNN) model and run the training process. To get started, we need to import the following dependencies:
 
 ```python
 # base libs
@@ -94,7 +100,7 @@ from azure.synapse.ml.horovodutils import AdlsStore
 
 ## Connect to alternative storage account
 
-We will need the Azure Data Lake Storage (ADLS) account for storing intermediate and model data. If you are using an alternative storage account, be sure to set up the [linked service](../../data-factory/concepts-linked-services.md) to automatically authenticate and read from the account. In addition, you will need to modify the following properties below: ```remote_url```, ```account_name```, and ```linked_service_name```.
+We need the Azure Data Lake Storage (ADLS) account for storing intermediate and model data. If you are using an alternative storage account, be sure to set up the [linked service](../../data-factory/concepts-linked-services.md) to automatically authenticate and read from the account. In addition, you need to modify the following properties: ```remote_url```, ```account_name```, and ```linked_service_name```.
 
 ```python
 num_proc = 3  # equal to numExecutors
@@ -164,7 +170,7 @@ train_df.count()
 
 ## Define DNN model
 
-Once we have finished processing our dataset, we can now define our PyTorch model. The same code could also be used to train a single-node PyTorch model.
+Once we are finished processing our dataset, we can now define our PyTorch model. The same code could also be used to train a single-node PyTorch model.
 
 ```python
 # Define the PyTorch model without any Horovod-specific parameters
@@ -227,7 +233,7 @@ torch_model = torch_estimator.fit(train_df).setOutputCols(['label_prob'])
 
 ## Evaluate trained model
 
-Once the training process has finished, we can then evaluate the model on the test dataset. 
+Once the training process completes, we can then evaluate the model on the test dataset. 
 
 ```python
 # Evaluate the model on the held-out test DataFrame