You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight-aks/flink/use-apache-nifi-with-datastream-api.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,9 @@
1
1
---
2
2
title: Use Apache NiFi with HDInsight on AKS clusters running Apache Flink® to publish into ADLS Gen2
3
-
description: Learn how to use Apache NiFi to consume processed Apache Kafka® topic from Apache Flink® on HDInsight on AKS clusters and publish into ADLS Gen2
3
+
description: Learn how to use Apache NiFi to consume processed Apache Kafka® topic from Apache Flink® on HDInsight on AKS clusters and publish into ADLS Gen2.
4
4
ms.service: hdinsight-aks
5
5
ms.topic: how-to
6
-
ms.date: 03/23/2024
6
+
ms.date: 03/25/2024
7
7
---
8
8
9
9
# Use Apache NiFi to consume processed Apache Kafka® topics from Apache Flink® and publish into ADLS Gen2
@@ -14,21 +14,21 @@ Apache NiFi is a software project from the Apache Software Foundation designed t
14
14
15
15
For more information, see [Apache NiFi](https://nifi.apache.org)
16
16
17
-
In this document, we process streaming data using HDInsight Kafka and perform some transformations on HDInsight Apache Flink on AKS, consume these topics and write the contents into ADLS Gen2 on Apache NiFi.
17
+
In this document, we process streaming data using HDInsight Kafka and perform some transformations on HDInsight Apache Flink on AKS, consume these topics, and write the contents into ADLS Gen2 on Apache NiFi.
18
18
19
19
By combining the low latency streaming features of Apache Flink and the dataflow capabilities of Apache NiFi, you can process events at high volume. This combination helps you to trigger, enrich, filter, to enhance overall user experience. Both these technologies complement each other with their strengths in event streaming and correlation.
20
20
21
21
## Prerequisites
22
22
23
23
*[Flink cluster on HDInsight on AKS](../flink/flink-create-cluster-portal.md)
24
24
*[Kafka cluster on HDInsight](../../hdinsight/kafka/apache-kafka-get-started.md)
25
-
* You're required to ensure the network settings are taken care as described on [Using Kafka on HDInsight](../flink/process-and-consume-data.md); that's to make sure HDInsight on AKS and HDInsight clusters are in the same VNet
25
+
* You're required to ensure the network settings taken care as described on [Using Kafka on HDInsight](../flink/process-and-consume-data.md) to make sure HDInsight on AKS and HDInsight clusters are in the same VNet
26
26
* For this demonstration, we're using a Window VM as maven project develop env in the same VNET as HDInsight on AKS
27
27
* For this demonstration, we're using an Ubuntu VM in the same VNET as HDInsight on AKS, install Apache NiFi 1.22.0 on this VM
28
28
29
29
## Prepare HDInsight Kafka topic
30
30
31
-
For purposes of this demonstration, we're using a HDInsight Kafka Cluster, let us prepare HDInsight Kafka topic for the demo.
31
+
For purposes of this demonstration, we're using a HDInsight Kafka Cluster. Let us prepare HDInsight Kafka topic for the demo.
32
32
33
33
> [!NOTE]
34
34
> Setup a HDInsight cluster with [Apache Kafka](../../hdinsight/kafka/apache-kafka-get-started.md) and replace broker list with your own list before you get started for both Kafka 2.4 and 3.2.
@@ -73,7 +73,7 @@ Here, we configure NiFi properties in order to be accessed outside the localhost
73
73
74
74
## Process streaming data from Kafka cluster on HDInsight with Flink cluster on HDInsight on AKS
75
75
76
-
Let us develop the source code on Maven, to build the jar.
76
+
Let us develop the source code on Maven, and build the jar.
77
77
78
78
**SinkToKafka.java**
79
79
@@ -182,7 +182,7 @@ public class ClickSource implements SourceFunction<Event> {
182
182
```
183
183
**Maven pom.xml**
184
184
185
-
You can replace 2.4.1 with 3.2.0 in case you're using Kafka 3.2.0 on HDInsight, where applicable on the pom.xml
185
+
You can replace 2.4.1 with 3.2.0 in case you're using Kafka 3.2.0 on HDInsight, where applicable on the pom.xml.
186
186
187
187
```xml
188
188
<?xml version="1.0" encoding="UTF-8"?>
@@ -261,7 +261,7 @@ You can replace 2.4.1 with 3.2.0 in case you're using Kafka 3.2.0 on HDInsight,
261
261
262
262
## Submit streaming job to Flink cluster on HDInsight on AKS
263
263
264
-
Now, lets submit streaming job as mentioned in the previous step into Flink cluster
264
+
Now, lets submit streaming job as mentioned in the previous step into Flink cluster.
265
265
266
266
:::image type="content" source="./media/use-apache-nifi-with-datastream-api/step-5-flink-ui-job-submission.png" alt-text="Screenshot showing how to submit the streaming job from FLink UI." border="true" lightbox="./media/use-apache-nifi-with-datastream-api/step-5-flink-ui-job-submission.png":::
> In this example, we use Azure User Managed Identity to credentials for ADLS Gen2.
302
302
303
-
In this demonstration, we have used Apache NiFi instance installed on an Ubuntu VM. We're accessing the NiFi web interface from a Windows VM. The Ubuntu VM needs to have a managed identity assigned to it and network security group (NSG) rules configured.
303
+
In this demonstration, we use Apache NiFi instance installed on an Ubuntu VM. We're accessing the NiFi web interface from a Windows VM. The Ubuntu VM needs to have a managed identity assigned to it and network security group (NSG) rules configured.
304
304
305
305
To use Managed Identity authentication with the PutAzureDataLakeStorage processor in NiFi. You're required to ensure Ubuntu VM on which NiFi is installed has a managed identity assigned to it, or assign a managed identity to the Ubuntu VM.
306
306
307
307
:::image type="content" source="./media/use-apache-nifi-with-datastream-api/step-6-nifi-ui-kafka-consumption.png" alt-text="Screenshot showing how to create a flow in Apache NiFi - Step 1." border="true" lightbox="./media/use-apache-nifi-with-datastream-api/step-6-nifi-ui-kafka-consumption.png":::
308
308
309
-
Once you have assigned a managed identity to the Azure VM, you need to make sure that the VM can connect to the IMDS (Instance Metadata Service) endpoint. The IMDS endpoint is available at the IP address shown in this example. You need to update your network security group rules to allow outbound traffic from the Ubuntu VM to this IP address.
309
+
Once you assign a managed identity to the Azure VM, you need to make sure that the VM can connect to the IMDS (Instance Metadata Service) endpoint. The IMDS endpoint is available at the IP address shown in this example. You need to update your network security group rules to allow outbound traffic from the Ubuntu VM to this IP address.
310
310
311
311
:::image type="content" source="./media/use-apache-nifi-with-datastream-api/step-6-2-nifi-ui-kafka-consumption.png" alt-text="Screenshot showing how to create a flow in Apache NiFi-Step2." border="true" lightbox="./media/use-apache-nifi-with-datastream-api/step-6-2-nifi-ui-kafka-consumption.png":::
312
312
@@ -340,4 +340,4 @@ Once you have assigned a managed identity to the Azure VM, you need to make sure
340
340
*[Azure Data Lake Storage](https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-azure-nar/1.12.0/org.apache.nifi.processors.azure.storage.PutAzureDataLakeStorage/index.html)
*[Download IntelliJ IDEA for development](https://www.jetbrains.com/idea/download/#section=windows)
343
-
* Apache, Apache Kafka, Kafka, Apache Flink, Flink,Apache NiFi, NiFi and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
343
+
* Apache, Apache Kafka, Kafka, Apache Flink, Flink,Apache NiFi, NiFi, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
Copy file name to clipboardExpand all lines: articles/hdinsight-aks/flink/use-flink-to-sink-kafka-message-into-hbase.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,9 @@
1
1
---
2
2
title: Write messages to Apache HBase® with Apache Flink® DataStream API
3
-
description: Learn how to write messages to Apache HBase with Apache Flink DataStream API
3
+
description: Learn how to write messages to Apache HBase with Apache Flink DataStream API.
4
4
ms.service: hdinsight-aks
5
5
ms.topic: how-to
6
-
ms.date: 03/23/2024
6
+
ms.date: 03/25/2024
7
7
---
8
8
9
9
# Write messages to Apache HBase® with Apache Flink® DataStream API
@@ -14,16 +14,16 @@ In this article, learn how to write messages to HBase with Apache Flink DataStre
14
14
15
15
## Overview
16
16
17
-
Apache Flink offers HBase connector as a sink, with this connector with Flink you can store the output of a real-time processing application in HBase. Learn how to process streaming data on HDInsight Kafka as a source, perform transformations, then sink into HDInsight HBase table.
17
+
Apache Flink offers HBase connector as a sink, with this connector with Flink you can store the output of a real-time processing application in HBase. Learn how to process streaming data on HDInsight Kafka as a source, perform transformations, then sink into HDInsight HBase table.
18
18
19
-
In a real world scenario, this example is a stream analytics layer to realize value from Internet of Things (IOT) analytics, which use live sensor data. The Flink Stream can read data from Kafka topic and write it to HBase table. If there is a real time streaming IOT application, the information can be gathered, transformed and optimized.
19
+
In a real world scenario, this example is a stream analytics layer to realize value from Internet of Things (IOT) analytics, which use live sensor data. The Flink Stream can read data from Kafka topic and write it to HBase table. If there's a real time streaming IOT application, the information can be gathered, transformed, and optimized.
20
20
21
21
22
22
## Prerequisites
23
23
24
24
*[Apache Flink cluster on HDInsight on AKS](../flink/flink-create-cluster-portal.md)
25
25
*[Apache Kafka cluster on HDInsight](../flink/process-and-consume-data.md)
*[Apache HBase 2.4.11 cluster on HDInsight](../../hdinsight/hbase/apache-hbase-tutorial-get-started-linux.md#create-apache-hbase-cluster)
27
27
* You're required to ensure HDInsight on AKS cluster can connect to HDInsight cluster, with same virtual network.
28
28
* Maven project on IntelliJ IDEA for development on an Azure VM in the same VNet
29
29
@@ -349,13 +349,13 @@ public class KafkaSinkToHbase {
349
349
350
350
### Submit job on Secure Shell
351
351
352
-
We use [Flink CLI](./flink-web-ssh-on-portal-to-flink-sql.md) from Azure portal to submit jobs
352
+
We use [Flink CLI](./flink-web-ssh-on-portal-to-flink-sql.md) from Azure portal to submit jobs.
353
353
354
354
:::image type="content" source="./media/use-flink-to-sink-kafka-message-into-hbase/submit-job-on-web-ssh.png" alt-text="Screenshot showing how to submit job on web ssh." lightbox="./media/use-flink-to-sink-kafka-message-into-hbase/submit-job-on-web-ssh.png":::
355
355
356
356
### Monitor job on Flink UI
357
357
358
-
We can monitor the jobs on Flink Web UI
358
+
We can monitor the jobs on Flink Web UI.
359
359
360
360
:::image type="content" source="./media/use-flink-to-sink-kafka-message-into-hbase/check-job-on-flink-ui.png" alt-text="Screenshot showing how to check job on Flink UI." lightbox="./media/use-flink-to-sink-kafka-message-into-hbase/check-job-on-flink-ui.png":::
0 commit comments