Skip to content

Commit 1c73774

Browse files
committed
Freshness and format updates
1 parent ddd57f4 commit 1c73774

File tree

1 file changed

+12
-7
lines changed

1 file changed

+12
-7
lines changed

articles/synapse-analytics/get-started-analyze-spark.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,33 @@
11
---
22
title: 'Quickstart: Get started analyzing with Spark'
3-
description: In this tutorial, you'll learn to analyze data with Apache Spark.
3+
description: In this tutorial, you'll learn to analyze some sample data with Apache Spark in Azure Synapse Analytics.
44
author: whhender
55
ms.author: whhender
66
ms.reviewer: whhender
77
ms.service: azure-synapse-analytics
88
ms.subservice: spark
9-
ms.topic: tutorial
10-
ms.date: 11/18/2022
9+
ms.topic: quickstart
10+
ms.date: 11/15/2024
1111
---
1212

13-
# Analyze with Apache Spark
13+
# Quickstart: Analyze with Apache Spark
1414

1515
In this tutorial, you'll learn the basic steps to load and analyze data with Apache Spark for Azure Synapse.
1616

17+
## Prerequisites
18+
19+
Make sure you have [placed the sample data in the primary storage account](get-started-create-workspace.md#place-sample-data-into-the-primary-storage-account).
20+
1721
## Create a serverless Apache Spark pool
1822

1923
1. In Synapse Studio, on the left-side pane, select **Manage** > **Apache Spark pools**.
20-
1. Select **New**
24+
1. Select **New**
2125
1. For **Apache Spark pool name** enter **Spark1**.
2226
1. For **Node size** enter **Small**.
2327
1. For **Number of nodes** Set the minimum to 3 and the maximum to 3
2428
1. Select **Review + create** > **Create**. Your Apache Spark pool will be ready in a few seconds.
2529

26-
## Understanding serverless Apache Spark pools
30+
## Understand serverless Apache Spark pools
2731

2832
A serverless Spark pool is a way of indicating how a user wants to work with Spark. When you start using a pool, a Spark session is created if needed. The pool controls how many Spark resources will be used by that session and how long the session will last before it automatically pauses. You pay for spark resources used during that session and not for the pool itself. This way a Spark pool lets you use Apache Spark without managing clusters. This is similar to how a serverless SQL pool works.
2933

@@ -63,6 +67,7 @@ Data is available via the dataframe named **df**. Load it into a Spark database
6367
spark.sql("CREATE DATABASE IF NOT EXISTS nyctaxi")
6468
df.write.mode("overwrite").saveAsTable("nyctaxi.trip")
6569
```
70+
6671
## Analyze the NYC Taxi data using Spark and notebooks
6772

6873
1. Create a new code cell and enter the following code.
@@ -93,7 +98,7 @@ Data is available via the dataframe named **df**. Load it into a Spark database
9398

9499
1. In the cell results, select **Chart** to see the data visualized.
95100

96-
## Next steps
101+
## Next step
97102

98103
> [!div class="nextstepaction"]
99104
> [Analyze data with dedicated SQL pool](get-started-analyze-sql-pool.md)

0 commit comments

Comments
 (0)