Skip to content

Commit 5ee3236

Browse files
authored
Merge pull request #39302 from anusricorp/anusricorp-patch-1
Update the Configuration for Spark.
2 parents b37e2d2 + 5c6ef09 commit 5ee3236

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

articles/hdinsight/hdinsight-apache-kafka-spark-structured-streaming.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,12 +183,15 @@ This example demonstrates how to use Spark Structured Streaming with Kafka on HD
183183
184184
4. Load packages used by the Notebook by entering the following information in a Notebook cell. Run the command by using **CTRL + ENTER**.
185185
186+
Spark streaming has microbatching, which means data comes as batches and executers run on the batches of data. If the executor has idle timeout less than the time it takes to process the batch then the executors would be constantly added and removed. If the executors idle timeout is greater than the batch duration, the executor never gets removed. Hence **we recommend that you disable dynamic allocation by setting spark.dynamicAllocation.enabled to false when running streaming applications.**
187+
186188
```
187189
%%configure -f
188190
{
189191
"conf": {
190192
"spark.jars.packages": "org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0",
191-
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.11"
193+
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.11",
194+
"spark.dynamicAllocation.enabled": false
192195
}
193196
}
194197
```

0 commit comments

Comments
 (0)