Skip to content

Commit 41404a4

Browse files
authored
Merge pull request #112324 from dagiro/freshness_c26
freshness_c26
2 parents 1030607 + 3b98e24 commit 41404a4

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

articles/hdinsight/hdinsight-apache-kafka-spark-structured-streaming.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: tutorial
99
ms.custom: hdinsightactive,seodec18
10-
ms.date: 03/11/2020
10+
ms.date: 04/22/2020
1111

1212
#Customer intent: As a developer, I want to learn how to use Spark Structured Streaming with Kafka on HDInsight.
1313
---
@@ -32,7 +32,7 @@ When you're done with the steps in this document, remember to delete the cluster
3232

3333
* Familiarity with using [Jupyter Notebooks](https://jupyter.org/) with Spark on HDInsight. For more information, see the [Load data and run queries with Apache Spark on HDInsight](spark/apache-spark-load-data-run-query.md) document.
3434

35-
* Familiarity with the [Scala](https://www.scala-lang.org/) programming language. The code used in this tutorial is written in Scala.
35+
* Familiarity with the Scala programming language. The code used in this tutorial is written in Scala.
3636

3737
* Familiarity with creating Kafka topics. For more information, see the [Apache Kafka on HDInsight quickstart](kafka/apache-kafka-get-started.md) document.
3838

@@ -45,7 +45,7 @@ When you're done with the steps in this document, remember to delete the cluster
4545
4646
## Structured Streaming with Apache Kafka
4747

48-
Spark Structured Streaming is a stream processing engine built on the Spark SQL engine. When using Structured Streaming, you can write streaming queries the same way that you write batch queries.
48+
Spark Structured Streaming is a stream processing engine built on the Spark SQL engine. When using Structured Streaming, you can write streaming queries the same way you write batch queries.
4949

5050
The following code snippets demonstrate reading from Kafka and storing to file. The first one is a batch operation, while the second one is a streaming operation:
5151

@@ -179,7 +179,7 @@ This example demonstrates how to use Spark Structured Streaming with Kafka on HD
179179
180180
1. Select **New > Spark** to create a notebook.
181181
182-
1. Spark streaming has microbatching, which means data comes as batches and executers run on the batches of data. If the executor has idle timeout less than the time it takes to process the batch, then the executors would be constantly added and removed. If the executors idle timeout is greater than the batch duration, the executor never gets removed. Hence **we recommend that you disable dynamic allocation by setting spark.dynamicAllocation.enabled to false when running streaming applications.**
182+
1. Spark streaming has microbatching, which means data comes as batches and executers run on the batches of data. If the executor has idle timeout less than the time it takes to process the batch, then the executors would be constantly added and removed. If the executors idle timeout is greater than the batch duration, the executor never gets removed. So **we recommend that you disable dynamic allocation by setting spark.dynamicAllocation.enabled to false when running streaming applications.**
183183
184184
Load packages used by the Notebook by entering the following information in a Notebook cell. Run the command by using **CTRL + ENTER**.
185185
@@ -274,7 +274,7 @@ This example demonstrates how to use Spark Structured Streaming with Kafka on HD
274274
println("Schema declared")
275275
```
276276
277-
1. Select data and start the stream. The following command demonstrates how to retrieve data from kafka using a batch query, and then write the results out to HDFS on the Spark cluster. In this example, the `select` retrieves the message (value field) from Kafka and applies the schema to it. The data is then written to HDFS (WASB or ADL) in parquet format. Enter the command in your next Jupyter cell.
277+
1. Select data and start the stream. The following command demonstrates how to retrieve data from Kafka using a batch query. And then write the results out to HDFS on the Spark cluster. In this example, the `select` retrieves the message (value field) from Kafka and applies the schema to it. The data is then written to HDFS (WASB or ADL) in parquet format. Enter the command in your next Jupyter cell.
278278
279279
```scala
280280
// Read a batch from Kafka
@@ -313,7 +313,7 @@ This example demonstrates how to use Spark Structured Streaming with Kafka on HD
313313
314314
## Clean up resources
315315
316-
To clean up the resources created by this tutorial, you can delete the resource group. Deleting the resource group also deletes the associated HDInsight cluster, and any other resources associated with the resource group.
316+
To clean up the resources created by this tutorial, you can delete the resource group. Deleting the resource group also deletes the associated HDInsight cluster. And any other resources associated with the resource group.
317317
318318
To remove the resource group using the Azure portal:
319319
@@ -328,7 +328,7 @@ To remove the resource group using the Azure portal:
328328
329329
## Next steps
330330
331-
In this tutorial, you learned how to use [Apache Spark Structured Streaming](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html) to write and read data from [Apache Kafka](./kafka/apache-kafka-introduction.md) on HDInsight. Use the following link to learn how to use [Apache Storm](./storm/apache-storm-overview.md) with Kafka.
331+
In this tutorial, you learned how to use Apache Spark Structured Streaming. To write and read data from Apache Kafka on HDInsight. Use the following link to learn how to use Apache Storm with Kafka.
332332
333333
> [!div class="nextstepaction"]
334334
> [Use Apache Storm with Apache Kafka](hdinsight-apache-storm-with-kafka.md)

0 commit comments

Comments
 (0)