You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/get-started-analyze-spark.md
+12-7Lines changed: 12 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,29 +1,33 @@
1
1
---
2
2
title: 'Quickstart: Get started analyzing with Spark'
3
-
description: In this tutorial, you'll learn to analyze data with Apache Spark.
3
+
description: In this tutorial, you'll learn to analyze some sample data with Apache Spark in Azure Synapse Analytics.
4
4
author: whhender
5
5
ms.author: whhender
6
6
ms.reviewer: whhender
7
7
ms.service: azure-synapse-analytics
8
8
ms.subservice: spark
9
-
ms.topic: tutorial
10
-
ms.date: 11/18/2022
9
+
ms.topic: quickstart
10
+
ms.date: 11/15/2024
11
11
---
12
12
13
-
# Analyze with Apache Spark
13
+
# Quickstart: Analyze with Apache Spark
14
14
15
15
In this tutorial, you'll learn the basic steps to load and analyze data with Apache Spark for Azure Synapse.
16
16
17
+
## Prerequisites
18
+
19
+
Make sure you have [placed the sample data in the primary storage account](get-started-create-workspace.md#place-sample-data-into-the-primary-storage-account).
20
+
17
21
## Create a serverless Apache Spark pool
18
22
19
23
1. In Synapse Studio, on the left-side pane, select **Manage** > **Apache Spark pools**.
20
-
1. Select **New**
24
+
1. Select **New**
21
25
1. For **Apache Spark pool name** enter **Spark1**.
22
26
1. For **Node size** enter **Small**.
23
27
1. For **Number of nodes** Set the minimum to 3 and the maximum to 3
24
28
1. Select **Review + create** > **Create**. Your Apache Spark pool will be ready in a few seconds.
25
29
26
-
## Understanding serverless Apache Spark pools
30
+
## Understand serverless Apache Spark pools
27
31
28
32
A serverless Spark pool is a way of indicating how a user wants to work with Spark. When you start using a pool, a Spark session is created if needed. The pool controls how many Spark resources will be used by that session and how long the session will last before it automatically pauses. You pay for spark resources used during that session and not for the pool itself. This way a Spark pool lets you use Apache Spark without managing clusters. This is similar to how a serverless SQL pool works.
29
33
@@ -63,6 +67,7 @@ Data is available via the dataframe named **df**. Load it into a Spark database
63
67
spark.sql("CREATE DATABASE IF NOT EXISTS nyctaxi")
Copy file name to clipboardExpand all lines: articles/synapse-analytics/machine-learning/tutorial-text-analytics-use-mmlspark.md
+23-14Lines changed: 23 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,15 @@ description: Learn how to use text analytics in Azure Synapse Analytics.
4
4
ms.service: azure-synapse-analytics
5
5
ms.subservice: machine-learning
6
6
ms.topic: tutorial
7
-
ms.date: 11/02/2021
7
+
ms.date: 11/19/2024
8
8
author: ruixinxu
9
9
ms.author: ruxu
10
+
# customer intent: As a Synapse Analytics user, I want to be able to analyze my text using Azure AI services.
10
11
---
11
12
12
13
# Tutorial: Text Analytics with Azure AI services
13
14
14
-
[Text Analytics](/azure/ai-services/language-service/) is an [Azure AI services](/azure/ai-services/) that enables you to perform text mining and text analysis with Natural Language Processing (NLP) features. In this tutorial, you'll learn how to use [Text Analytics](/azure/ai-services/language-service/) to analyze unstructured text on Azure Synapse Analytics.
15
+
In this tutorial, you learn how to use [Text Analytics](/azure/ai-services/language-service/)to analyze unstructured text on Azure Synapse Analytics. [Text Analytics](/azure/ai-services/language-service/)is an [Azure AI services](/azure/ai-services/) that enables you to perform text mining and text analysis with Natural Language Processing (NLP) features.
15
16
16
17
This tutorial demonstrates using text analytics with [SynapseML](https://github.com/microsoft/SynapseML) to:
17
18
@@ -29,34 +30,35 @@ If you don't have an Azure subscription, [create a free account before you begin
29
30
30
31
-[Azure Synapse Analytics workspace](../get-started-create-workspace.md) with an Azure Data Lake Storage Gen2 storage account configured as the default storage. You need to be the *Storage Blob Data Contributor* of the Data Lake Storage Gen2 file system that you work with.
31
32
- Spark pool in your Azure Synapse Analytics workspace. For details, see [Create a Spark pool in Azure Synapse](../quickstart-create-sql-pool-studio.md).
32
-
- Pre-configuration steps described in the tutorial [Configure Azure AI services in Azure Synapse](tutorial-configure-cognitive-services-synapse.md).
33
-
33
+
- Preconfiguration steps described in the tutorial [Configure Azure AI services in Azure Synapse](tutorial-configure-cognitive-services-synapse.md).
34
34
35
35
## Get started
36
-
Open Synapse Studio and create a new notebook. To get started, import [SynapseML](https://github.com/microsoft/SynapseML).
36
+
37
+
Open Synapse Studio and create a new notebook. To get started, import [SynapseML](https://github.com/microsoft/SynapseML).
37
38
38
39
```python
39
40
import synapse.ml
40
-
from synapse.ml.cognitiveimport*
41
+
from synapse.ml.servicesimport*
41
42
from pyspark.sql.functions import col
42
43
```
43
44
44
45
## Configure text analytics
45
46
46
-
Use the linked text analytics you configured in the [pre-configuration steps](tutorial-configure-cognitive-services-synapse.md) .
47
+
Use the linked text analytics you configured in the [preconfiguration steps](tutorial-configure-cognitive-services-synapse.md).
47
48
48
49
```python
49
-
ai_service_name="<Your linked service for text analytics>"
50
+
linked_service_name="<Your linked service for text analytics>"
50
51
```
51
52
52
53
## Text Sentiment
53
-
The Text Sentiment Analysis provides a way for detecting the sentiment labels (such as "negative", "neutral" and "positive") and confidence scores at the sentence and document-level. See the [Supported languages in Text Analytics API](/azure/ai-services/language-service/language-detection/overview?tabs=sentiment-analysis) for the list of enabled languages.
54
+
55
+
The Text Sentiment Analysis provides a way for detecting the sentiment labels (such as "negative", "neutral", and "positive") and confidence scores at the sentence and document-level. See the [Supported languages in Text Analytics API](/azure/ai-services/language-service/language-detection/overview?tabs=sentiment-analysis) for the list of enabled languages.
54
56
55
57
```python
56
58
57
59
# Create a dataframe that's tied to it's column names
58
60
df = spark.createDataFrame([
59
-
("I am so happy today, its sunny!", "en-US"),
61
+
("I am so happy today, it's sunny!", "en-US"),
60
62
("I am frustrated by this rush hour traffic", "en-US"),
61
63
("The Azure AI services on spark aint bad", "en-US"),
62
64
], ["text", "language"])
@@ -77,13 +79,14 @@ display(results
77
79
.select("text", "sentiment"))
78
80
79
81
```
82
+
80
83
### Expected results
81
84
82
85
|text|sentiment|
83
86
|---|---|
84
-
|I am so happy today, its sunny!|positive|
85
-
|I am frustrated by this rush hour traffic|negative|
86
-
|The Azure AI services on spark aint bad|positive|
87
+
|I'm so happy today, it's sunny!|positive|
88
+
|I'm frustrated by this rush hour traffic|negative|

191
196
192
197
---
193
198
194
199
## Personally Identifiable Information (PII) V3.1
200
+
195
201
The PII feature is part of NER and it can identify and redact sensitive entities in text that are associated with an individual person such as: phone number, email address, mailing address, passport number. See the [Supported languages in Text Analytics API](/azure/ai-services/language-service/language-detection/overview?tabs=pii) for the list of enabled languages.

214
222
215
223
---
216
224
217
225
## Clean up resources
226
+
218
227
To ensure the Spark instance is shut down, end any connected sessions(notebooks). The pool shuts down when the **idle time** specified in the Apache Spark pool is reached. You can also select **stop session** from the status bar at the upper right of the notebook.
219
228
220
229

221
230
222
-
## Next steps
231
+
## Related content
223
232
224
233
*[Check out Synapse sample notebooks](https://github.com/Azure-Samples/Synapse/tree/main/MachineLearning)
Managed private endpoints are private endpoints created in a Managed Virtual Network associated with your Azure Synapse workspace. Managed private endpoints establish a private link to Azure resources. Azure Synapse manages these private endpoints on your behalf. You can create Managed private endpoints from your Azure Synapse workspace to access Azure services (such as Azure Storage or Azure Cosmos DB) and Azure hosted customer/partner services.
20
16
@@ -28,14 +24,13 @@ Learn more about [private links and private endpoints](../../private-link/index.
28
24
>[!NOTE]
29
25
>When creating an Azure Synapse workspace, you can choose to associate a Managed Virtual Network to it. If you choose to have a Managed Virtual Network associated to your workspace, you can also choose to limit outbound traffic from your workspace to only approved targets. You must create Managed private endpoints to these targets.
30
26
31
-
32
27
A private endpoint connection is created in a "Pending" state when you create a Managed private endpoint in Azure Synapse. An approval workflow is started. The private link resource owner is responsible to approve or reject the connection. If the owner approves the connection, the private link is established. But, if the owner doesn't approve the connection, then the private link won't be established. In either case, the Managed private endpoint will be updated with the status of the connection. Only a Managed private endpoint in an approved state can be used to send traffic to the private link resource that is linked to the Managed private endpoint.
33
28
34
29
## Managed private endpoints for dedicated SQL pool and serverless SQL pool
35
30
36
-
Dedicated SQL pool and serverless SQL pool are analytic capabilities in your Azure Synapse workspace. These capabilities use multi-tenant infrastructure that isn't deployed into the [Managed workspace Virtual Network](./synapse-workspace-managed-vnet.md).
31
+
Dedicated SQL pool and serverless SQL pool are analytic capabilities in your Azure Synapse workspace. These capabilities use multitenant infrastructure that isn't deployed into the [Managed workspace Virtual Network](./synapse-workspace-managed-vnet.md).
37
32
38
-
When a workspace is created, Azure Synapse creates two Managed private endpoints in the workspace, one for dedicated SQL pool and one for serverless SQL pool.
33
+
When a workspace is created, Azure Synapse creates two Managed private endpoints in the workspace, one for dedicated SQL pool and one for serverless SQL pool.
39
34
40
35
These two Managed private endpoints are listed in Synapse Studio. Select **Manage** in the left navigation, then select **Managed private endpoints** to see them in the Studio.
41
36
@@ -45,7 +40,6 @@ The Managed private endpoint that targets SQL pool is called *synapse-ws-sql--\<
45
40
46
41
These two Managed private endpoints are automatically created for you when you create your Azure Synapse workspace. You aren't charged for these two Managed private endpoints.
47
42
48
-
49
43
## Supported data sources
50
44
51
45
Azure Synapse Spark supports over 25 data sources to connect to using managed private endpoints. Users need to specify the resource identifier, which can be found in the **Properties** settings page of their data source in the Azure portal.
@@ -81,6 +75,6 @@ Azure Synapse Spark supports over 25 data sources to connect to using managed pr
@@ -120,7 +120,6 @@ To learn more about how to manage session-scoped packages, see the following art
120
120
121
121
-[R session packages](./apache-spark-manage-session-packages.md#session-scoped-r-packages-preview): Within your session, you can install packages across all nodes within your Spark pool by using `install.packages` or `devtools`.
122
122
123
-
124
123
## Automate the library management process through Azure PowerShell cmdlets and REST APIs
125
124
126
125
If your team wants to manage libraries without visiting the package management UIs, you have the option to manage the workspace packages and pool-level package updates through Azure PowerShell cmdlets or REST APIs for Azure Synapse Analytics.
@@ -130,7 +129,7 @@ For more information, see the following articles:
130
129
-[Manage your Spark pool libraries through REST APIs](apache-spark-manage-packages-outside-ui.md#manage-packages-through-rest-apis)
131
130
-[Manage your Spark pool libraries through Azure PowerShell cmdlets](apache-spark-manage-packages-outside-ui.md#manage-packages-through-azure-powershell-cmdlets)
132
131
133
-
## Next steps
132
+
## Related content
134
133
135
134
-[View the default libraries and supported Apache Spark versions](apache-spark-version-support.md)
0 commit comments