You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/machine-learning/quickstart-gallery-sample-notebook.md
+4-22Lines changed: 4 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ description: Learn how to use a sample notebook from the Synapse Analytics galle
4
4
ms.service: synapse-analytics
5
5
ms.subservice: machine-learning
6
6
ms.topic: quickstart
7
-
ms.date: 06/11/2021
8
-
author: WilliamDAssafMSFT
9
-
ms.author: wiassaf
7
+
ms.date: 02/29/2024
8
+
author: midesa
9
+
ms.author: midesa
10
10
ms.custom: mode-other
11
11
---
12
12
@@ -27,7 +27,7 @@ This notebook demonstrates the basic steps used in creating a model: **data impo
27
27
1. Open your workspace and select **Learn** from the home page.
28
28
1. In the **Knowledge center**, select **Browse gallery**.
29
29
1. In the gallery, select **Notebooks**.
30
-
1. Find and select the notebook "Data Exploration and ML Modeling - NYC taxi predict using Spark MLib".
30
+
1. Find and select a notebook from the gallery.
31
31
32
32
:::image type="content" source="media\quickstart-gallery-sample-notebook\gallery-select-ml-notebook.png" alt-text="Select the machine learning sample notebook in the gallery.":::
33
33
@@ -38,24 +38,6 @@ This notebook demonstrates the basic steps used in creating a model: **data impo
38
38
39
39
1. In the **Attach to** menu in the open notebook, select your Apache Spark pool.
40
40
41
-
## Run the notebook
42
-
43
-
The notebook is divided into multiple cells that each perform a specific function.
44
-
You can manually run each cell, running cells sequentially, or select **Run all** to run all the cells.
45
-
46
-
Here are descriptions for each of the cells in the notebook:
47
-
48
-
1. Import PySpark functions that the notebook uses.
49
-
1.**Ingest Date** - Ingest data from the Azure Open Dataset **NycTlcYellow** into a local dataframe for processing. The code extracts data within a specific time period - you can modify the start and end dates to get different data.
50
-
1. Downsample the dataset to make development faster. You can modify this step to change the sample size or the sampling seed.
51
-
1.**Exploratory Data Analysis** - Display charts to view the data. This can give you an idea what data prep might be needed before creating the model.
52
-
1.**Data Prep and Featurization** - Filter out outlier data discovered through visualization and create some useful derived variables.
53
-
1.**Data Prep and Featurization Part 2** - Drop unneeded columns and create some additional features.
54
-
1.**Encoding** - Convert string variables to numbers that the Logistic Regression model is expecting.
55
-
1.**Generation of Testing and Training Data Sets** - Split the data into separate testing and training data sets. You can modify the fraction and randomizing seed used to split the data.
56
-
1.**Train the Model** - Train a Logistic Regression model and display its "Area under ROC" metric to see how well the model is working. This step also saves the trained model in case you want to use it elsewhere.
57
-
1.**Evaluate and Visualize** - Plot the model's ROC curve to further evaluate the model.
58
-
59
41
## Save the notebook
60
42
61
43
To save your notebook by selecting **Publish** on the workspace command bar.
Copy file name to clipboardExpand all lines: articles/synapse-analytics/machine-learning/quickstart-integrate-azure-machine-learning.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ ms.service: synapse-analytics
5
5
ms.subservice: machine-learning
6
6
ms.topic: quickstart
7
7
ms.reviewer: sngun, garye
8
-
ms.date: 12/16/2021
8
+
ms.date: 02/29/2024
9
9
author: nelgson
10
10
ms.author: negust
11
11
ms.custom: mode-other
@@ -14,8 +14,9 @@ ms.custom: mode-other
14
14
# Quickstart: Create a new Azure Machine Learning linked service in Synapse
15
15
16
16
> **IMPORTANT, PLEASE NOTE THE BELOW LIMITATIONS:**
17
-
> -**The Azure ML integration is not currently supported in Synapse Workspaces with Data Exfiltration Protection.** If you are **not** using data exfiltration protection and want to connect to Azure ML using private endpoints, you can set up a managed AzureML private endpoint in your Synapse workspace. [Read more about managed private endpoints](../security/how-to-create-managed-private-endpoints.md)
17
+
> -**The Azure Machine Learning integration is not currently supported in Synapse Workspaces with Data Exfiltration Protection.** If you are **not** using data exfiltration protection and want to connect to Azure Machine Learning using private endpoints, you can set up a managed Azure Machine Learning private endpoint in your Synapse workspace. [Read more about managed private endpoints](../security/how-to-create-managed-private-endpoints.md)
18
18
> -**AzureML linked service is not supported with self hosted integration runtimes.** This applies to Synapse workspaces with and without Data Exfiltration Protection.
19
+
> -**The Azure Synapse Spark 3.3 and 3.4 runtimes do not support using the Azure Machine Learning Linked Service to authenticate to the Azure Machine Learning MLFlow tracking URI.** To learn more about the limitations on these runtimes, see [Azure Synapse Runtime for Apache Spark 3.3](../spark/apache-spark-33-runtime.md) and [Azure Synapse Runtime for Apache Spark 3.4](../spark//apache-spark-34-runtime.md)
19
20
20
21
In this quickstart, you'll link an Azure Synapse Analytics workspace to an Azure Machine Learning workspace. Linking these workspaces allows you to leverage Azure Machine Learning from various experiences in Synapse.
21
22
@@ -46,13 +47,13 @@ In the following sections, you'll find guidance on how to create an Azure Machin
46
47
47
48
This section will guide you on how to create an Azure Machine Learning linked service in Azure Synapse, using the [Azure Synapse workspace Managed Identity](../../data-factory/data-factory-service-identity.md?context=/azure/synapse-analytics/context/context&tabs=synapse-analytics)
48
49
49
-
### Give MSI permission to the Azure ML workspace
50
+
### Give MSI permission to the Azure Machine Learning workspace
50
51
51
52
1. Navigate to your Azure Machine Learning workspace resource in the Azure portal and select **Access Control**
52
53
53
54
1. Create a role assignment and add your Synapse workspace Managed Service identity (MSI) as a *contributor* of the Azure Machine Learning workspace. Note that this will require being an owner of the resource group that the Azure Machine Learning workspace belongs to. If you have trouble finding your Synapse workspace MSI, search for the name of the Synapse workspace.
54
55
55
-
### Create an Azure ML linked service
56
+
### Create an Azure Machine Learning linked service
56
57
57
58
1. In the Synapse workspace where you want to create the new Azure Machine Learning linked service, go to **Manage** > **Linked services**, and create a new linked service with type "Azure Machine Learning".
58
59
@@ -94,7 +95,7 @@ This step will create a new Service Principal. If you want to use an existing Se
### Create an Azure Machine Learning linked service
98
99
99
100
1. In the Synapse workspace where you want to create the new Azure Machine Learning linked service, go to **Manage** -> **Linked services**, create a new linked service with type "Azure Machine Learning".
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-data-visualization-tutorial.md
+12-9Lines changed: 12 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ author: midesa
5
5
ms.service: synapse-analytics
6
6
ms.topic: conceptual
7
7
ms.subservice: machine-learning
8
-
ms.date: 10/20/2020
8
+
ms.date: 02/29/2024
9
9
ms.author: midesa
10
10
---
11
11
@@ -35,14 +35,17 @@ Create an Apache Spark Pool by following the [Create an Apache Spark pool tutori
35
35
3. Because the raw data is in a Parquet format, you can use the Spark context to pull the file into memory as a DataFrame directly. Create a Spark DataFrame by retrieving the data via the Open Datasets API. Here, we use the Spark DataFrame *schema on read* properties to infer the datatypes and schema.
4. After the data is read, we'll want to do some initial filtering to clean the dataset. We might remove unneeded columns and add columns that extract important information. In addition, we'll filter out anomalies within the dataset.
2. The downside to simple filtering is that, from a statistical perspective, it might introduce bias into the data. Another approach is to use the sampling built into Spark.
0 commit comments