Skip to content

Commit 5aa0ab3

Browse files
authored
Merge pull request #112643 from dagiro/freshness_c40
freshness_c40
2 parents 2ff8104 + 4c86dae commit 5aa0ab3

File tree

1 file changed

+6
-7
lines changed

1 file changed

+6
-7
lines changed

articles/hdinsight/spark/apache-spark-settings.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdinsightactive
10-
ms.date: 04/15/2020
10+
ms.date: 04/24/2020
1111
---
1212

1313
# Configure Apache Spark settings
1414

15-
An HDInsight Spark cluster includes an installation of the [Apache Spark](https://spark.apache.org/) library. Each HDInsight cluster includes default configuration parameters for all its installed services, including Spark. A key aspect of managing an HDInsight Apache Hadoop cluster is monitoring workload, including Spark Jobs. To best run Spark jobs, consider the physical cluster configuration when determining the cluster's logical configuration.
15+
An HDInsight Spark cluster includes an installation of the Apache Spark library. Each HDInsight cluster includes default configuration parameters for all its installed services, including Spark. A key aspect of managing an HDInsight Apache Hadoop cluster is monitoring workload, including Spark Jobs. To best run Spark jobs, consider the physical cluster configuration when determining the cluster's logical configuration.
1616

17-
The default HDInsight Apache Spark cluster includes the following nodes: three [Apache ZooKeeper](https://zookeeper.apache.org/) nodes, two head nodes, and one or more worker nodes:
17+
The default HDInsight Apache Spark cluster includes the following nodes: three Apache ZooKeeper nodes, two head nodes, and one or more worker nodes:
1818

1919
![Spark HDInsight Architecture](./media/apache-spark-settings/spark-hdinsight-arch.png)
2020

@@ -97,7 +97,7 @@ Depending on your Spark workload, you may determine that a non-default Spark con
9797
|---|---|
9898
|--num-executors|Sets the number of executors.|
9999
|--executor-cores|Sets the number of cores for each executor. We recommend using middle-sized executors, as other processes also consume some portion of the available memory.|
100-
|--executor-memory|Controls the memory size (heap size) of each executor on [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html), and you'll need to leave some memory for execution overhead.|
100+
|--executor-memory|Controls the memory size (heap size) of each executor on Apache Hadoop YARN, and you'll need to leave some memory for execution overhead.|
101101

102102
Here is an example of two worker nodes with different configuration values:
103103

@@ -124,8 +124,8 @@ Spark clusters in HDInsight include a number of components by default. Each of t
124124
|---|---|
125125
|Spark Core|Spark Core, Spark SQL, Spark streaming APIs, GraphX, and Apache Spark MLlib.|
126126
|Anaconda|A python package manager.|
127-
|[Apache Livy](https://livy.incubator.apache.org/)|The Apache Spark REST API, used to submit remote jobs to an HDInsight Spark cluster.|
128-
|[Jupyter](https://jupyter.org/) and [Apache Zeppelin](https://zeppelin.apache.org/) notebooks|Interactive browser-based UI for interacting with your Spark cluster.|
127+
|Apache Livy|The Apache Spark REST API, used to submit remote jobs to an HDInsight Spark cluster.|
128+
|Jupyter and Apache Zeppelin notebooks|Interactive browser-based UI for interacting with your Spark cluster.|
129129
|ODBC driver|Connects Spark clusters in HDInsight to business intelligence (BI) tools such as Microsoft Power BI and Tableau.|
130130

131131
For applications running in the Jupyter notebook, use the `%%configure` command to make configuration changes from within the notebook itself. These configuration changes will be applied to the Spark jobs run from your notebook instance. Make such changes at the beginning of the application, before you run your first code cell. The changed configuration is applied to the Livy session when it gets created.
@@ -148,6 +148,5 @@ Monitor core configuration settings to ensure your Spark jobs run in a predictab
148148

149149
* [Apache Hadoop components and versions available with HDInsight?](../hdinsight-component-versioning.md)
150150
* [Manage resources for an Apache Spark cluster on HDInsight](apache-spark-resource-manager.md)
151-
* [Set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more](../hdinsight-hadoop-provision-linux-clusters.md)
152151
* [Apache Spark Configuration](https://spark.apache.org/docs/latest/configuration.html)
153152
* [Running Apache Spark on Apache Hadoop YARN](https://spark.apache.org/docs/latest/running-on-yarn.html)

0 commit comments

Comments
 (0)