You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight-aks/flink/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,9 @@
1
1
---
2
2
title: Write event messages into Azure Data Lake Storage Gen2 with Apache Flink® DataStream API
3
-
description: Learn how to write event messages into Azure Data Lake Storage Gen2 with Apache Flink® DataStream API
3
+
description: Learn how to write event messages into Azure Data Lake Storage Gen2 with Apache Flink® DataStream API.
4
4
ms.service: hdinsight-aks
5
5
ms.topic: how-to
6
-
ms.date: 10/27/2023
6
+
ms.date: 03/14/2024
7
7
---
8
8
9
9
# Write event messages into Azure Data Lake Storage Gen2 with Apache Flink® DataStream API
@@ -22,11 +22,11 @@ Apache Flink uses file systems to consume and persistently store data, both for
22
22
23
23
## Apache Flink FileSystem connector
24
24
25
-
This filesystem connector provides the same guarantees for both BATCH and STREAMING and is designed to provide exactly once semantics for STREAMING execution. For more information, see [Flink DataStream Filesystem](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem)
25
+
This filesystem connector provides the same guarantees for both BATCH and STREAMING and is designed to provide exactly once semantics for STREAMING execution. For more information, see [Flink DataStream Filesystem](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem).
26
26
27
27
## Apache Kafka Connector
28
28
29
-
Flink provides an Apache Kafka connector for reading data from and writing data to Kafka topics with exactly once guarantees. For more information, see [Apache Kafka Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka)
29
+
Flink provides an Apache Kafka connector for reading data from and writing data to Kafka topics with exactly once guarantees. For more information, see [Apache Kafka Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka).
30
30
31
31
## Build the project for Apache Flink
32
32
@@ -36,7 +36,7 @@ Flink provides an Apache Kafka connector for reading data from and writing data
@@ -163,17 +163,17 @@ public class KafkaSinkToGen2 {
163
163
164
164
**Submit the job on Flink Dashboard UI**
165
165
166
-
We are using Maven to package a jar onto local and submitting to Flink, and using Kafka to sink into ADLS Gen2
166
+
We are using Maven to package a jar onto local and submitting to Flink, and using Kafka to sink into ADLS Gen2.
167
167
168
168
:::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui.png" alt-text="Screenshot showing jar submission to Flink dashboard.":::
169
-
:::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui-2.png" alt-text="Screenshot showing job running on Flink dashboard.":::
169
+
:::Image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui-2.png" alt-text="Screenshot showing job running on Flink dashboard.":::
170
170
171
171
**Validate streaming data on ADLS Gen2**
172
172
173
173
We are seeing the `click_events` streaming into ADLS Gen2
Copy file name to clipboardExpand all lines: articles/hdinsight-aks/flink/flink-catalog-delta-hive.md
+11-9Lines changed: 11 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ title: Table API and SQL - Use Delta Catalog type with Hive with Apache Flink®
3
3
description: Learn about how to create Delta Catalog with Apache Flink® on Azure HDInsight on AKS
4
4
ms.service: hdinsight-aks
5
5
ms.topic: how-to
6
-
ms.date: 08/29/2023
6
+
ms.date: 03/14/2024
7
7
---
8
8
9
9
# Create Delta Catalog with Apache Flink® on Azure HDInsight on AKS
@@ -23,11 +23,12 @@ In this article, we learn how Apache Flink SQL/TableAPI is used to implement a D
23
23
Once you launch the Secure Shell (SSH), let us start downloading the dependencies required to the SSH node, to illustrate the Delta table managed in Hive catalog.
In this article, you can learn how you can enrich the real time events by joining a stream from Kafka with table on ADLS Gen2 using Flink Streaming. We use Flink Streaming API to join events from HDInsight Kafka with attributes from ADLS Gen2, further we use attributes-joined events to sink into another Kafka topic.
13
+
In this article, you can learn how you can enrich the real time events by joining a stream from Kafka with table on ADLS Gen2 using Flink Streaming. We use Flink Streaming API to join events from HDInsight Kafka with attributes from ADLS Gen2. Further we use attributes-joined events to sink into another Kafka topic.
14
14
15
15
## Prerequisites
16
16
17
17
*[Flink cluster on HDInsight on AKS](../flink/flink-create-cluster-portal.md)
18
18
*[Kafka cluster on HDInsight](../../hdinsight/kafka/apache-kafka-get-started.md)
19
-
*You're required to ensure the network settings are taken care as described on [Using Kafka on HDInsight](../flink/process-and-consume-data.md); that's to make sure HDInsight on AKS and HDInsight clusters are in the same VNet
19
+
*Ensure the network settings are taken care as described on [Using Kafka on HDInsight](../flink/process-and-consume-data.md) to make sure HDInsight on AKS and HDInsight clusters are in the same VNet
20
20
* For this demonstration, we're using a Window VM as maven project develop environment in the same VNet as HDInsight on AKS
21
21
22
22
## Kafka topic preparation
@@ -45,7 +45,7 @@ We're creating a topic called `user_events`.
45
45
46
46
## Prepare file on ADLS Gen2
47
47
48
-
We are creating a file called `item attributes` in our storage
48
+
We're creating a file called `item attributes` in our storage
49
49
50
50
- The purpose is to read a batch of `item attributes` from a file on ADLS Gen2. Each item has the following fields:
51
51
```
@@ -59,7 +59,7 @@ We are creating a file called `item attributes` in our storage
59
59
60
60
## Develop the Apache Flink job
61
61
62
-
In this step we perform the following activities
62
+
In this step, we perform the following activities
63
63
- Enrich the `user_events` topic from Kafka by joining with `item attributes` from a file on ADLS Gen2.
64
64
- We push the outcome of this step, as an enriched user activity of events into a Kafka topic.
65
65
@@ -81,7 +81,7 @@ In this step we perform the following activities
@@ -254,7 +259,7 @@ public class KafkaJoinGen2Demo {
254
259
}
255
260
```
256
261
257
-
## Package jar and submit to Apache Flink
262
+
## Package jar, and submit to Apache Flink
258
263
259
264
We're submitting the packaged jar to Flink:
260
265
@@ -265,13 +270,13 @@ We're submitting the packaged jar to Flink:
265
270
266
271
### Produce real-time `user_events` topic on Kafka
267
272
268
-
We are able to produce real-time user behavior event `user_events` in Kafka.
273
+
We're able to produce real-time user behavior event `user_events` in Kafka.
269
274
270
275
:::image type="content" source="./media/join-stream-kafka-table-filesystem/step-5-kafka-3-2.png" alt-text="Screenshot showing a real-time user behavior event on Kafka 3.2." border="true" lightbox="./media/join-stream-kafka-table-filesystem/step-5-kafka-3-2.png":::
271
276
272
277
### Consume the `itemAttributes` joining with `user_events` on Kafka
273
278
274
-
We are now using `itemAttributes` on filesystem join user activity events `user_events`.
279
+
We're now using `itemAttributes` on filesystem join user activity events `user_events`.
275
280
276
281
:::image type="content" source="./media/join-stream-kafka-table-filesystem/step-6-kafka-3-2.png" alt-text="Screenshot showing Consume the item attributes-joined user activity events on Kafka 3.2." border="true" lightbox="./media/join-stream-kafka-table-filesystem/step-6-kafka-3-2.png":::
Move the jar flink-table-planner_2.12-1.16.0-0.0.18.jar located in webssh pod's /opt to /lib and move out the jar flink-table-planner-loader-1.16.0-0.0.18.jar from /lib. Refer to [issue](https://issues.apache.org/jira/browse/FLINK-25128) for more details. Perform the following steps to move the planner jar.
0 commit comments