Merge pull request #39302 from anusricorp/anusricorp-patch-1

ktoliver · web-flow · commit 5ee32362a411 · 2019-09-23T08:04:31.000-07:00
Update the Configuration for Spark.
diff --git a/articles/hdinsight/hdinsight-apache-kafka-spark-structured-streaming.md b/articles/hdinsight/hdinsight-apache-kafka-spark-structured-streaming.md
@@ -183,12 +183,15 @@ This example demonstrates how to use Spark Structured Streaming with Kafka on HD
 
 4. Load packages used by the Notebook by entering the following information in a Notebook cell. Run the command by using **CTRL + ENTER**.
 
+Spark streaming has microbatching, which means data comes as batches and executers run on the batches of data. If the executor has idle timeout less than the time it takes to process the batch then the executors would be constantly added and removed. If the executors idle timeout is greater than the batch duration, the executor never gets removed. Hence **we recommend that you disable dynamic allocation by setting spark.dynamicAllocation.enabled to false when running streaming applications.**
+
     ```
     %%configure -f
     {
         "conf": {
             "spark.jars.packages": "org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0",
-            "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.11"
+            "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.11",
+            "spark.dynamicAllocation.enabled": false
         }
     }
     ```

Original file line number	Diff line number	Diff line change
`@@ -183,12 +183,15 @@ This example demonstrates how to use Spark Structured Streaming with Kafka on HD`
`183`	`183`
`184`	`184`	`4. Load packages used by the Notebook by entering the following information in a Notebook cell. Run the command by using CTRL + ENTER.`
`185`	`185`
	`186`	`+Spark streaming has microbatching, which means data comes as batches and executers run on the batches of data. If the executor has idle timeout less than the time it takes to process the batch then the executors would be constantly added and removed. If the executors idle timeout is greater than the batch duration, the executor never gets removed. Hence we recommend that you disable dynamic allocation by setting spark.dynamicAllocation.enabled to false when running streaming applications.`
	`187`	`+`
`186`	`188`	```
`187`	`189`	`%%configure -f`
`188`	`190`	`{`
`189`	`191`	`"conf": {`
`190`	`192`	`"spark.jars.packages": "org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0",`
`191`		`- "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.11"`
	`193`	`+ "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.11",`
	`194`	`+ "spark.dynamicAllocation.enabled": false`
`192`	`195`	`}`
`193`	`196`	`}`
`194`	`197`	```