You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-apache-kafka-spark-structured-streaming.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: tutorial
9
9
ms.custom: hdinsightactive,seodec18
10
-
ms.date: 03/11/2020
10
+
ms.date: 04/22/2020
11
11
12
12
#Customer intent: As a developer, I want to learn how to use Spark Structured Streaming with Kafka on HDInsight.
13
13
---
@@ -32,7 +32,7 @@ When you're done with the steps in this document, remember to delete the cluster
32
32
33
33
* Familiarity with using [Jupyter Notebooks](https://jupyter.org/) with Spark on HDInsight. For more information, see the [Load data and run queries with Apache Spark on HDInsight](spark/apache-spark-load-data-run-query.md) document.
34
34
35
-
* Familiarity with the [Scala](https://www.scala-lang.org/) programming language. The code used in this tutorial is written in Scala.
35
+
* Familiarity with the Scala programming language. The code used in this tutorial is written in Scala.
36
36
37
37
* Familiarity with creating Kafka topics. For more information, see the [Apache Kafka on HDInsight quickstart](kafka/apache-kafka-get-started.md) document.
38
38
@@ -45,7 +45,7 @@ When you're done with the steps in this document, remember to delete the cluster
45
45
46
46
## Structured Streaming with Apache Kafka
47
47
48
-
Spark Structured Streaming is a stream processing engine built on the Spark SQL engine. When using Structured Streaming, you can write streaming queries the same way that you write batch queries.
48
+
Spark Structured Streaming is a stream processing engine built on the Spark SQL engine. When using Structured Streaming, you can write streaming queries the same way you write batch queries.
49
49
50
50
The following code snippets demonstrate reading from Kafka and storing to file. The first one is a batch operation, while the second one is a streaming operation:
51
51
@@ -179,7 +179,7 @@ This example demonstrates how to use Spark Structured Streaming with Kafka on HD
179
179
180
180
1. Select **New > Spark** to create a notebook.
181
181
182
-
1. Spark streaming has microbatching, which means data comes as batches and executers run on the batches of data. If the executor has idle timeout less than the time it takes to process the batch, then the executors would be constantly added and removed. If the executors idle timeout is greater than the batch duration, the executor never gets removed. Hence **we recommend that you disable dynamic allocation by setting spark.dynamicAllocation.enabled to false when running streaming applications.**
182
+
1. Spark streaming has microbatching, which means data comes as batches and executers run on the batches of data. If the executor has idle timeout less than the time it takes to process the batch, then the executors would be constantly added and removed. If the executors idle timeout is greater than the batch duration, the executor never gets removed. So **we recommend that you disable dynamic allocation by setting spark.dynamicAllocation.enabled to false when running streaming applications.**
183
183
184
184
Load packages used by the Notebook by entering the following information in a Notebook cell. Run the command by using **CTRL + ENTER**.
185
185
@@ -274,7 +274,7 @@ This example demonstrates how to use Spark Structured Streaming with Kafka on HD
274
274
println("Schema declared")
275
275
```
276
276
277
-
1. Select data and start the stream. The following command demonstrates how to retrieve data from kafka using a batch query, and then write the results out to HDFS on the Spark cluster. In this example, the `select` retrieves the message (value field) from Kafka and applies the schema to it. The data is then written to HDFS (WASB or ADL) in parquet format. Enter the command in your next Jupyter cell.
277
+
1. Select data and start the stream. The following command demonstrates how to retrieve data from Kafka using a batch query. And then write the results out to HDFS on the Spark cluster. In this example, the `select` retrieves the message (value field) from Kafka and applies the schema to it. The data is then written to HDFS (WASB or ADL) in parquet format. Enter the command in your next Jupyter cell.
278
278
279
279
```scala
280
280
// Read a batch from Kafka
@@ -313,7 +313,7 @@ This example demonstrates how to use Spark Structured Streaming with Kafka on HD
313
313
314
314
## Clean up resources
315
315
316
-
To clean up the resources created by this tutorial, you can delete the resource group. Deleting the resource group also deletes the associated HDInsight cluster, and any other resources associated with the resource group.
316
+
To clean up the resources created by this tutorial, you can delete the resource group. Deleting the resource group also deletes the associated HDInsight cluster. And any other resources associated with the resource group.
317
317
318
318
To remove the resource group using the Azure portal:
319
319
@@ -328,7 +328,7 @@ To remove the resource group using the Azure portal:
328
328
329
329
## Next steps
330
330
331
-
In this tutorial, you learned how to use [Apache Spark Structured Streaming](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html) to write and read data from [Apache Kafka](./kafka/apache-kafka-introduction.md) on HDInsight. Use the following link to learn how to use [Apache Storm](./storm/apache-storm-overview.md) with Kafka.
331
+
In this tutorial, you learned how to use Apache Spark Structured Streaming. To write and read data from Apache Kafka on HDInsight. Use the following link to learn how to use Apache Storm with Kafka.
332
332
333
333
> [!div class="nextstepaction"]
334
334
> [Use Apache Storm with Apache Kafka](hdinsight-apache-storm-with-kafka.md)
0 commit comments