Skip to content

Commit 0d512a1

Browse files
committed
Style compliance and acrolinx
1 parent c9d111c commit 0d512a1

File tree

1 file changed

+6
-13
lines changed

1 file changed

+6
-13
lines changed

articles/synapse-analytics/spark/apache-spark-overview.md

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,15 @@ ms.reviewer: euang
1111
ms.custom: kr2b-contr-experiment
1212
---
1313

14-
# Apache Spark in Azure Synapse Analytics
14+
# What is Apache Spark in Azure Synapse Analytics?
1515

1616
Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big data analytic applications. Apache Spark in Azure Synapse Analytics is one of Microsoft's implementations of Apache Spark in the cloud. Azure Synapse makes it easy to create and configure a serverless Apache Spark pool in Azure. Spark pools in Azure Synapse are compatible with Azure Storage and Azure Data Lake Generation 2 Storage. So you can use Spark pools to process your data stored in Azure.
1717

1818
![Diagram shows Spark SQL, Spark MLib, and GraphX linked to the Spark core engine, above a YARN layer over storage services.](./media/apache-spark-overview/spark-overview.png)
1919

2020
## What is Apache Spark
2121

22-
Apache Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is much faster than disk-based applications. Spark also integrates with multiple programming languages to let you manipulate distributed data sets like local collections. There's no need to structure everything as map and reduce operations. You can learn more from the [Apache Spark for Synapse video](https://www.youtube.com/watch?v=bTdu3PjXN3o).
22+
Apache Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is faster than disk-based applications. Spark also integrates with multiple programming languages to let you manipulate distributed data sets like local collections. There's no need to structure everything as map and reduce operations. You can learn more from the [Apache Spark for Synapse video](https://www.youtube.com/watch?v=bTdu3PjXN3o).
2323

2424
![Diagram shows Traditional MapReduce, with disk-based apps and Spark, with cache-based operations.](./media/apache-spark-overview/map-reduce-vs-spark.png)
2525

@@ -63,14 +63,13 @@ Apache Spark includes many language features to support preparation and processi
6363

6464
- Machine Learning
6565

66-
Apache Spark comes with [MLlib](https://spark.apache.org/mllib/), a machine learning library built on top of Spark that you can use from a Spark pool in Azure Synapse Analytics. Spark pools in Azure Synapse Analytics also include Anaconda, a Python distribution with a variety of packages for data science including machine learning. When combined with built-in support for notebooks, you have an environment for creating machine learning applications.
66+
Apache Spark comes with [MLlib](https://spark.apache.org/mllib/), a machine learning library built on top of Spark that you can use from a Spark pool in Azure Synapse Analytics. Spark pools in Azure Synapse Analytics also include Anaconda, a Python distribution with various packages for data science including machine learning. When combined with built-in support for notebooks, you have an environment for creating machine learning applications.
6767

6868
- Streaming Data
6969

70-
Synapse Spark supports Spark structured streaming as long as you are running supported version of Azure Synapse Spark runtime release. All jobs are supported to live for seven days. This applies to both batch and streaming jobs, and generally, customers automate restart process using Azure Functions.
70+
Synapse Spark supports Spark structured streaming as long as you're running supported version of Azure Synapse Spark runtime release. All jobs are supported to live for seven days. This applies to both batch and streaming jobs, and generally, customers automate restart process using Azure Functions.
7171

72-
73-
## Where do I start
72+
## Related content
7473

7574
Use the following articles to learn more about Apache Spark in Azure Synapse Analytics:
7675

@@ -79,10 +78,4 @@ Use the following articles to learn more about Apache Spark in Azure Synapse Ana
7978
- [Tutorial: Machine learning using Apache Spark](./apache-spark-machine-learning-mllib-notebook.md)
8079

8180
> [!NOTE]
82-
> Some of the official Apache Spark documentation relies on using the Spark console, which is not available on Azure Synapse Spark. Use the notebook or IntelliJ experiences instead.
83-
84-
## Next steps
85-
86-
This overview provided a basic understanding of Apache Spark in Azure Synapse Analytics. Advance to the next article to learn how to create a Spark pool in Azure Synapse Analytics:
87-
88-
- [Create a Spark pool in Azure Synapse](../quickstart-create-apache-spark-pool-portal.md)
81+
> Some of the official Apache Spark documentation relies on using the Spark console, which is not available on Azure Synapse Spark. Use the notebook or IntelliJ experiences instead.

0 commit comments

Comments
 (0)