You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-overview.md
+6-13Lines changed: 6 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,15 +11,15 @@ ms.reviewer: euang
11
11
ms.custom: kr2b-contr-experiment
12
12
---
13
13
14
-
# Apache Spark in Azure Synapse Analytics
14
+
# What is Apache Spark in Azure Synapse Analytics?
15
15
16
16
Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big data analytic applications. Apache Spark in Azure Synapse Analytics is one of Microsoft's implementations of Apache Spark in the cloud. Azure Synapse makes it easy to create and configure a serverless Apache Spark pool in Azure. Spark pools in Azure Synapse are compatible with Azure Storage and Azure Data Lake Generation 2 Storage. So you can use Spark pools to process your data stored in Azure.
17
17
18
18

19
19
20
20
## What is Apache Spark
21
21
22
-
Apache Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is much faster than disk-based applications. Spark also integrates with multiple programming languages to let you manipulate distributed data sets like local collections. There's no need to structure everything as map and reduce operations. You can learn more from the [Apache Spark for Synapse video](https://www.youtube.com/watch?v=bTdu3PjXN3o).
22
+
Apache Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is faster than disk-based applications. Spark also integrates with multiple programming languages to let you manipulate distributed data sets like local collections. There's no need to structure everything as map and reduce operations. You can learn more from the [Apache Spark for Synapse video](https://www.youtube.com/watch?v=bTdu3PjXN3o).
23
23
24
24

25
25
@@ -63,14 +63,13 @@ Apache Spark includes many language features to support preparation and processi
63
63
64
64
- Machine Learning
65
65
66
-
Apache Spark comes with [MLlib](https://spark.apache.org/mllib/), a machine learning library built on top of Spark that you can use from a Spark pool in Azure Synapse Analytics. Spark pools in Azure Synapse Analytics also include Anaconda, a Python distribution with a variety of packages for data science including machine learning. When combined with built-in support for notebooks, you have an environment for creating machine learning applications.
66
+
Apache Spark comes with [MLlib](https://spark.apache.org/mllib/), a machine learning library built on top of Spark that you can use from a Spark pool in Azure Synapse Analytics. Spark pools in Azure Synapse Analytics also include Anaconda, a Python distribution with various packages for data science including machine learning. When combined with built-in support for notebooks, you have an environment for creating machine learning applications.
67
67
68
68
- Streaming Data
69
69
70
-
Synapse Spark supports Spark structured streaming as long as you are running supported version of Azure Synapse Spark runtime release. All jobs are supported to live for seven days. This applies to both batch and streaming jobs, and generally, customers automate restart process using Azure Functions.
70
+
Synapse Spark supports Spark structured streaming as long as you're running supported version of Azure Synapse Spark runtime release. All jobs are supported to live for seven days. This applies to both batch and streaming jobs, and generally, customers automate restart process using Azure Functions.
71
71
72
-
73
-
## Where do I start
72
+
## Related content
74
73
75
74
Use the following articles to learn more about Apache Spark in Azure Synapse Analytics:
76
75
@@ -79,10 +78,4 @@ Use the following articles to learn more about Apache Spark in Azure Synapse Ana
79
78
-[Tutorial: Machine learning using Apache Spark](./apache-spark-machine-learning-mllib-notebook.md)
80
79
81
80
> [!NOTE]
82
-
> Some of the official Apache Spark documentation relies on using the Spark console, which is not available on Azure Synapse Spark. Use the notebook or IntelliJ experiences instead.
83
-
84
-
## Next steps
85
-
86
-
This overview provided a basic understanding of Apache Spark in Azure Synapse Analytics. Advance to the next article to learn how to create a Spark pool in Azure Synapse Analytics:
87
-
88
-
-[Create a Spark pool in Azure Synapse](../quickstart-create-apache-spark-pool-portal.md)
81
+
> Some of the official Apache Spark documentation relies on using the Spark console, which is not available on Azure Synapse Spark. Use the notebook or IntelliJ experiences instead.
0 commit comments