Skip to content

Commit 6fba4d9

Browse files
authored
Merge pull request #290822 from whhender/november-2024-synapse-freshness
November 2024 synapse freshness
2 parents e0e2706 + daf0671 commit 6fba4d9

13 files changed

+121
-119
lines changed
-1.81 KB
Loading
-6.13 KB
Loading
4.14 KB
Loading

articles/synapse-analytics/cicd/source-control.md

Lines changed: 49 additions & 48 deletions
Large diffs are not rendered by default.

articles/synapse-analytics/get-started-analyze-spark.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,33 @@
11
---
22
title: 'Quickstart: Get started analyzing with Spark'
3-
description: In this tutorial, you'll learn to analyze data with Apache Spark.
3+
description: In this tutorial, you'll learn to analyze some sample data with Apache Spark in Azure Synapse Analytics.
44
author: whhender
55
ms.author: whhender
66
ms.reviewer: whhender
77
ms.service: azure-synapse-analytics
88
ms.subservice: spark
9-
ms.topic: tutorial
10-
ms.date: 11/18/2022
9+
ms.topic: quickstart
10+
ms.date: 11/15/2024
1111
---
1212

13-
# Analyze with Apache Spark
13+
# Quickstart: Analyze with Apache Spark
1414

1515
In this tutorial, you'll learn the basic steps to load and analyze data with Apache Spark for Azure Synapse.
1616

17+
## Prerequisites
18+
19+
Make sure you have [placed the sample data in the primary storage account](get-started-create-workspace.md#place-sample-data-into-the-primary-storage-account).
20+
1721
## Create a serverless Apache Spark pool
1822

1923
1. In Synapse Studio, on the left-side pane, select **Manage** > **Apache Spark pools**.
20-
1. Select **New**
24+
1. Select **New**
2125
1. For **Apache Spark pool name** enter **Spark1**.
2226
1. For **Node size** enter **Small**.
2327
1. For **Number of nodes** Set the minimum to 3 and the maximum to 3
2428
1. Select **Review + create** > **Create**. Your Apache Spark pool will be ready in a few seconds.
2529

26-
## Understanding serverless Apache Spark pools
30+
## Understand serverless Apache Spark pools
2731

2832
A serverless Spark pool is a way of indicating how a user wants to work with Spark. When you start using a pool, a Spark session is created if needed. The pool controls how many Spark resources will be used by that session and how long the session will last before it automatically pauses. You pay for spark resources used during that session and not for the pool itself. This way a Spark pool lets you use Apache Spark without managing clusters. This is similar to how a serverless SQL pool works.
2933

@@ -63,6 +67,7 @@ Data is available via the dataframe named **df**. Load it into a Spark database
6367
spark.sql("CREATE DATABASE IF NOT EXISTS nyctaxi")
6468
df.write.mode("overwrite").saveAsTable("nyctaxi.trip")
6569
```
70+
6671
## Analyze the NYC Taxi data using Spark and notebooks
6772

6873
1. Create a new code cell and enter the following code.
@@ -93,7 +98,7 @@ Data is available via the dataframe named **df**. Load it into a Spark database
9398

9499
1. In the cell results, select **Chart** to see the data visualized.
95100

96-
## Next steps
101+
## Next step
97102

98103
> [!div class="nextstepaction"]
99104
> [Analyze data with dedicated SQL pool](get-started-analyze-sql-pool.md)

articles/synapse-analytics/get-started-pipelines.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,43 +7,43 @@ ms.reviewer: whhender
77
ms.service: azure-synapse-analytics
88
ms.subservice: pipeline
99
ms.topic: tutorial
10-
ms.date: 12/31/2020
10+
ms.date: 11/20/2024
1111
---
1212

13-
# Integrate with pipelines
13+
# Tutorial: Integrate with pipelines
1414

1515
In this tutorial, you'll learn how to integrate pipelines and activities using Synapse Studio.
1616

1717
## Create a pipeline and add a notebook activity
1818

1919
1. In Synapse Studio, go to the **Integrate** hub.
20-
1. Select **+** > **Pipeline** to create a new pipeline. Click on the new pipeline object to open the Pipeline designer.
20+
1. Select **+** > **Pipeline** to create a new pipeline. Select the new pipeline object to open the Pipeline designer.
2121
1. Under **Activities**, expand the **Synapse** folder, and drag a **Notebook** object into the designer.
2222
1. Select the **Settings** tab of the Notebook activity properties. Use the drop-down list to select a notebook from your current Synapse workspace.
2323

2424
## Schedule the pipeline to run every hour
2525

2626
1. In the pipeline, select **Add trigger** > **New/edit**.
2727
1. In **Choose trigger**, select **New**, and set the **Recurrence** to "every 1 hour".
28-
1. Select **OK**.
29-
1. Select **Publish All**.
28+
1. Select **OK**.
29+
1. Select **Publish All**.
3030

3131
## Forcing a pipeline to run immediately
3232

33-
Once the pipeline is published, you may want to run it immediately without waiting for an hour to pass.
33+
Once the pipeline is published, you might want to run it immediately without waiting for an hour to pass.
3434

3535
1. Open the pipeline.
36-
1. Click **Add trigger** > **Trigger now**.
37-
1. Select **OK**.
36+
1. Select **Add trigger** > **Trigger now**.
37+
1. Select **OK**.
3838

3939
## Monitor pipeline execution
4040

4141
1. Go to the **Monitor** hub.
4242
1. Select **Pipeline runs** to monitor pipeline execution progress.
43-
1. In this view you can switch between tabular **List** display a graphical **Gantt** chart.
44-
1. Click on a pipeline name to see the status of activities in that pipeline.
43+
1. In this view you can switch between tabular **List** display a graphical **Gantt** chart.
44+
1. Select a pipeline name to see the status of activities in that pipeline.
4545

46-
## Next steps
46+
## Next step
4747

4848
> [!div class="nextstepaction"]
4949
> [Visualize data with Power BI](get-started-visualize-power-bi.md)

articles/synapse-analytics/machine-learning/tutorial-text-analytics-use-mmlspark.md

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,15 @@ description: Learn how to use text analytics in Azure Synapse Analytics.
44
ms.service: azure-synapse-analytics
55
ms.subservice: machine-learning
66
ms.topic: tutorial
7-
ms.date: 11/02/2021
7+
ms.date: 11/19/2024
88
author: ruixinxu
99
ms.author: ruxu
10+
# customer intent: As a Synapse Analytics user, I want to be able to analyze my text using Azure AI services.
1011
---
1112

1213
# Tutorial: Text Analytics with Azure AI services
1314

14-
[Text Analytics](/azure/ai-services/language-service/) is an [Azure AI services](/azure/ai-services/) that enables you to perform text mining and text analysis with Natural Language Processing (NLP) features. In this tutorial, you'll learn how to use [Text Analytics](/azure/ai-services/language-service/) to analyze unstructured text on Azure Synapse Analytics.
15+
In this tutorial, you learn how to use [Text Analytics](/azure/ai-services/language-service/) to analyze unstructured text on Azure Synapse Analytics. [Text Analytics](/azure/ai-services/language-service/) is an [Azure AI services](/azure/ai-services/) that enables you to perform text mining and text analysis with Natural Language Processing (NLP) features.
1516

1617
This tutorial demonstrates using text analytics with [SynapseML](https://github.com/microsoft/SynapseML) to:
1718

@@ -29,34 +30,35 @@ If you don't have an Azure subscription, [create a free account before you begin
2930

3031
- [Azure Synapse Analytics workspace](../get-started-create-workspace.md) with an Azure Data Lake Storage Gen2 storage account configured as the default storage. You need to be the *Storage Blob Data Contributor* of the Data Lake Storage Gen2 file system that you work with.
3132
- Spark pool in your Azure Synapse Analytics workspace. For details, see [Create a Spark pool in Azure Synapse](../quickstart-create-sql-pool-studio.md).
32-
- Pre-configuration steps described in the tutorial [Configure Azure AI services in Azure Synapse](tutorial-configure-cognitive-services-synapse.md).
33-
33+
- Preconfiguration steps described in the tutorial [Configure Azure AI services in Azure Synapse](tutorial-configure-cognitive-services-synapse.md).
3434

3535
## Get started
36-
Open Synapse Studio and create a new notebook. To get started, import [SynapseML](https://github.com/microsoft/SynapseML).
36+
37+
Open Synapse Studio and create a new notebook. To get started, import [SynapseML](https://github.com/microsoft/SynapseML).
3738

3839
```python
3940
import synapse.ml
40-
from synapse.ml.cognitive import *
41+
from synapse.ml.services import *
4142
from pyspark.sql.functions import col
4243
```
4344

4445
## Configure text analytics
4546

46-
Use the linked text analytics you configured in the [pre-configuration steps](tutorial-configure-cognitive-services-synapse.md) .
47+
Use the linked text analytics you configured in the [preconfiguration steps](tutorial-configure-cognitive-services-synapse.md).
4748

4849
```python
49-
ai_service_name = "<Your linked service for text analytics>"
50+
linked_service_name = "<Your linked service for text analytics>"
5051
```
5152

5253
## Text Sentiment
53-
The Text Sentiment Analysis provides a way for detecting the sentiment labels (such as "negative", "neutral" and "positive") and confidence scores at the sentence and document-level. See the [Supported languages in Text Analytics API](/azure/ai-services/language-service/language-detection/overview?tabs=sentiment-analysis) for the list of enabled languages.
54+
55+
The Text Sentiment Analysis provides a way for detecting the sentiment labels (such as "negative", "neutral", and "positive") and confidence scores at the sentence and document-level. See the [Supported languages in Text Analytics API](/azure/ai-services/language-service/language-detection/overview?tabs=sentiment-analysis) for the list of enabled languages.
5456

5557
```python
5658

5759
# Create a dataframe that's tied to it's column names
5860
df = spark.createDataFrame([
59-
("I am so happy today, its sunny!", "en-US"),
61+
("I am so happy today, it's sunny!", "en-US"),
6062
("I am frustrated by this rush hour traffic", "en-US"),
6163
("The Azure AI services on spark aint bad", "en-US"),
6264
], ["text", "language"])
@@ -77,13 +79,14 @@ display(results
7779
.select("text", "sentiment"))
7880

7981
```
82+
8083
### Expected results
8184

8285
|text|sentiment|
8386
|---|---|
84-
|I am so happy today, its sunny!|positive|
85-
|I am frustrated by this rush hour traffic|negative|
86-
|The Azure AI services on spark aint bad|positive|
87+
|I'm so happy today, it's sunny!|positive|
88+
|I'm frustrated by this rush hour traffic|negative|
89+
|The Azure AI services on spark aint bad|neutral|
8790

8891
---
8992

@@ -186,12 +189,15 @@ ner = (NER()
186189

187190
display(ner.transform(df).select("text", col("replies").getItem("document").getItem("entities").alias("entities")))
188191
```
192+
189193
### Expected results
194+
190195
![Expected results for named entity recognition v3.1](./media/tutorial-text-analytics-use-mmlspark/expected-output-ner-v-31.png)
191196

192197
---
193198

194199
## Personally Identifiable Information (PII) V3.1
200+
195201
The PII feature is part of NER and it can identify and redact sensitive entities in text that are associated with an individual person such as: phone number, email address, mailing address, passport number. See the [Supported languages in Text Analytics API](/azure/ai-services/language-service/language-detection/overview?tabs=pii) for the list of enabled languages.
196202

197203
```python
@@ -209,17 +215,20 @@ pii = (PII()
209215

210216
display(pii.transform(df).select("text", col("replies").getItem("document").getItem("entities").alias("entities")))
211217
```
218+
212219
### Expected results
220+
213221
![Expected results for personal identifiable information v3.1](./media/tutorial-text-analytics-use-mmlspark/expected-output-pii-v-31.png)
214222

215223
---
216224

217225
## Clean up resources
226+
218227
To ensure the Spark instance is shut down, end any connected sessions(notebooks). The pool shuts down when the **idle time** specified in the Apache Spark pool is reached. You can also select **stop session** from the status bar at the upper right of the notebook.
219228

220229
![Screenshot showing the Stop session button on the status bar.](./media/tutorial-build-applications-use-mmlspark/stop-session.png)
221230

222-
## Next steps
231+
## Related content
223232

224233
* [Check out Synapse sample notebooks](https://github.com/Azure-Samples/Synapse/tree/main/MachineLearning)
225234
* [SynapseML GitHub Repo](https://github.com/microsoft/SynapseML)

articles/synapse-analytics/security/synapse-workspace-managed-private-endpoints.md

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,14 @@ title: Managed private endpoints
33
description: An article that explains Managed private endpoints in Azure Synapse Analytics
44
author: ashinMSFT
55
ms.service: azure-synapse-analytics
6-
ms.topic: overview
6+
ms.topic: concept-article
77
ms.subservice: security
8-
ms.date: 01/12/2020
8+
ms.date: 11/15/2024
99
ms.author: seshin
1010
ms.reviewer: whhender
1111
---
1212

13-
# Synapse Managed private endpoints
14-
15-
This article will explain Managed private endpoints in Azure Synapse Analytics.
16-
17-
## Managed private endpoints
13+
# Azure Synapse Analytics managed private endpoints
1814

1915
Managed private endpoints are private endpoints created in a Managed Virtual Network associated with your Azure Synapse workspace. Managed private endpoints establish a private link to Azure resources. Azure Synapse manages these private endpoints on your behalf. You can create Managed private endpoints from your Azure Synapse workspace to access Azure services (such as Azure Storage or Azure Cosmos DB) and Azure hosted customer/partner services.
2016

@@ -28,14 +24,13 @@ Learn more about [private links and private endpoints](../../private-link/index.
2824
>[!NOTE]
2925
>When creating an Azure Synapse workspace, you can choose to associate a Managed Virtual Network to it. If you choose to have a Managed Virtual Network associated to your workspace, you can also choose to limit outbound traffic from your workspace to only approved targets. You must create Managed private endpoints to these targets.
3026
31-
3227
A private endpoint connection is created in a "Pending" state when you create a Managed private endpoint in Azure Synapse. An approval workflow is started. The private link resource owner is responsible to approve or reject the connection. If the owner approves the connection, the private link is established. But, if the owner doesn't approve the connection, then the private link won't be established. In either case, the Managed private endpoint will be updated with the status of the connection. Only a Managed private endpoint in an approved state can be used to send traffic to the private link resource that is linked to the Managed private endpoint.
3328

3429
## Managed private endpoints for dedicated SQL pool and serverless SQL pool
3530

36-
Dedicated SQL pool and serverless SQL pool are analytic capabilities in your Azure Synapse workspace. These capabilities use multi-tenant infrastructure that isn't deployed into the [Managed workspace Virtual Network](./synapse-workspace-managed-vnet.md).
31+
Dedicated SQL pool and serverless SQL pool are analytic capabilities in your Azure Synapse workspace. These capabilities use multitenant infrastructure that isn't deployed into the [Managed workspace Virtual Network](./synapse-workspace-managed-vnet.md).
3732

38-
When a workspace is created, Azure Synapse creates two Managed private endpoints in the workspace, one for dedicated SQL pool and one for serverless SQL pool.
33+
When a workspace is created, Azure Synapse creates two Managed private endpoints in the workspace, one for dedicated SQL pool and one for serverless SQL pool.
3934

4035
These two Managed private endpoints are listed in Synapse Studio. Select **Manage** in the left navigation, then select **Managed private endpoints** to see them in the Studio.
4136

@@ -45,7 +40,6 @@ The Managed private endpoint that targets SQL pool is called *synapse-ws-sql--\<
4540

4641
These two Managed private endpoints are automatically created for you when you create your Azure Synapse workspace. You aren't charged for these two Managed private endpoints.
4742

48-
4943
## Supported data sources
5044

5145
Azure Synapse Spark supports over 25 data sources to connect to using managed private endpoints. Users need to specify the resource identifier, which can be found in the **Properties** settings page of their data source in the Azure portal.
@@ -81,6 +75,6 @@ Azure Synapse Spark supports over 25 data sources to connect to using managed pr
8175
| Azure App Services | /subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Web/sites/{app-service-name}
8276

8377

84-
## Next steps
78+
## Get started
8579

86-
To learn more, advance to the [Create Managed private endpoints to your data sources](./how-to-create-managed-private-endpoints.md) article.
80+
To learn more, advance to the [create managed private endpoints to your data sources](./how-to-create-managed-private-endpoints.md) article.

articles/synapse-analytics/spark/apache-spark-azure-portal-add-libraries.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ title: Manage Apache Spark packages
33
description: Learn how to add and manage libraries used by Apache Spark in Azure Synapse Analytics.
44
author: shuaijunye
55
ms.service: azure-synapse-analytics
6-
ms.reviewer: whhender, whhender, eskot
6+
ms.reviewer: whhender, eskot
77
ms.topic: how-to
8-
ms.date: 04/15/2023
8+
ms.date: 11/15/2024
99
ms.author: shuaijunye
1010
ms.subservice: spark
1111
ms.custom: kr2b-contr-experiment, devx-track-azurepowershell
@@ -120,7 +120,6 @@ To learn more about how to manage session-scoped packages, see the following art
120120

121121
- [R session packages](./apache-spark-manage-session-packages.md#session-scoped-r-packages-preview): Within your session, you can install packages across all nodes within your Spark pool by using `install.packages` or `devtools`.
122122

123-
124123
## Automate the library management process through Azure PowerShell cmdlets and REST APIs
125124

126125
If your team wants to manage libraries without visiting the package management UIs, you have the option to manage the workspace packages and pool-level package updates through Azure PowerShell cmdlets or REST APIs for Azure Synapse Analytics.
@@ -130,7 +129,7 @@ For more information, see the following articles:
130129
- [Manage your Spark pool libraries through REST APIs](apache-spark-manage-packages-outside-ui.md#manage-packages-through-rest-apis)
131130
- [Manage your Spark pool libraries through Azure PowerShell cmdlets](apache-spark-manage-packages-outside-ui.md#manage-packages-through-azure-powershell-cmdlets)
132131

133-
## Next steps
132+
## Related content
134133

135134
- [View the default libraries and supported Apache Spark versions](apache-spark-version-support.md)
136135
- [Troubleshoot library installation errors](apache-spark-troubleshoot-library-errors.md)

0 commit comments

Comments
 (0)