MicrosoftDocs
diff --git a/‎articles/hdinsight-aks/flink/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2.md
Lines changed: 8 additions & 8 deletions b/‎articles/hdinsight-aks/flink/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2.md
Lines changed: 8 additions & 8 deletions
diff --git a/‎articles/hdinsight-aks/flink/flink-catalog-delta-hive.md
Lines changed: 11 additions & 9 deletions b/‎articles/hdinsight-aks/flink/flink-catalog-delta-hive.md
Lines changed: 11 additions & 9 deletions
diff --git a/‎articles/hdinsight-aks/flink/join-stream-kafka-table-filesystem.md
Lines changed: 23 additions & 18 deletions b/‎articles/hdinsight-aks/flink/join-stream-kafka-table-filesystem.md
Lines changed: 23 additions & 18 deletions
diff --git a/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui.png
10.3 KB b/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui.png
10.3 KB
diff --git a/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/validate-stream-azure-data-lake-storage-gen2-2.png
371 KB b/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/validate-stream-azure-data-lake-storage-gen2-2.png
371 KB
diff --git a/‎articles/hdinsight-aks/flink/media/join-stream-kafka-table-filesystem/step-4-2-kafka-3-2.png
34.6 KB b/‎articles/hdinsight-aks/flink/media/join-stream-kafka-table-filesystem/step-4-2-kafka-3-2.png
34.6 KB
diff --git a/‎articles/hdinsight-aks/flink/media/use-hive-catalog/create-flink-cluster.png
72.4 KB b/‎articles/hdinsight-aks/flink/media/use-hive-catalog/create-flink-cluster.png
72.4 KB
diff --git a/‎articles/hdinsight-aks/flink/media/use-hive-catalog/flink-ui.png
-12.8 KB b/‎articles/hdinsight-aks/flink/media/use-hive-catalog/flink-ui.png
-12.8 KB
diff --git a/‎articles/hdinsight-aks/flink/media/use-hive-catalog/sink-user-transaction.png
-187 KB b/‎articles/hdinsight-aks/flink/media/use-hive-catalog/sink-user-transaction.png
-187 KB
diff --git a/‎articles/hdinsight-aks/flink/use-hive-catalog.md
Lines changed: 6 additions & 6 deletions b/‎articles/hdinsight-aks/flink/use-hive-catalog.md
Lines changed: 6 additions & 6 deletions
@@ -1,9 +1,9 @@
 ---
 title: Write event messages into Azure Data Lake Storage Gen2 with Apache Flink® DataStream API
-description: Learn how to write event messages into Azure Data Lake Storage Gen2 with Apache Flink® DataStream API
+description: Learn how to write event messages into Azure Data Lake Storage Gen2 with Apache Flink® DataStream API.
 ms.service: hdinsight-aks
 ms.topic: how-to
-ms.date: 10/27/2023
+ms.date: 03/14/2024
 ---
 
 # Write event messages into Azure Data Lake Storage Gen2 with Apache Flink® DataStream API
@@ -22,11 +22,11 @@ Apache Flink uses file systems to consume and persistently store data, both for
 
 ## Apache Flink FileSystem connector
 
-This filesystem connector provides the same guarantees for both BATCH and STREAMING and is designed to provide exactly once semantics for STREAMING execution. For more information, see [Flink DataStream Filesystem](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem)
+This filesystem connector provides the same guarantees for both BATCH and STREAMING and is designed to provide exactly once semantics for STREAMING execution. For more information, see [Flink DataStream Filesystem](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem).
 
 ## Apache Kafka Connector
 
-Flink provides an Apache Kafka connector for reading data from and writing data to Kafka topics with exactly once guarantees. For more information, see [Apache Kafka Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka)
+Flink provides an Apache Kafka connector for reading data from and writing data to Kafka topics with exactly once guarantees. For more information, see [Apache Kafka Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka).
 
 ## Build the project for Apache Flink
 
@@ -36,7 +36,7 @@ Flink provides an Apache Kafka connector for reading data from and writing data
 <properties>
         <maven.compiler.source>1.8</maven.compiler.source>
         <maven.compiler.target>1.8</maven.compiler.target>
-        <flink.version>1.16.0</flink.version>
+        <flink.version>1.17.0</flink.version>
         <java.version>1.8</java.version>
         <scala.binary.version>2.12</scala.binary.version>
         <kafka.version>3.2.0</kafka.version>
@@ -163,17 +163,17 @@ public class KafkaSinkToGen2 {
 
 **Submit the job on Flink Dashboard UI**
 
-We are using Maven to package a jar onto local and submitting to Flink, and using Kafka to sink into ADLS Gen2
+We are using Maven to package a jar onto local and submitting to Flink, and using Kafka to sink into ADLS Gen2.
 
 :::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui.png" alt-text="Screenshot showing jar submission to Flink dashboard.":::
-:::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui-2.png" alt-text="Screenshot showing job running on Flink dashboard.":::
+:::Image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui-2.png" alt-text="Screenshot showing job running on Flink dashboard.":::
 
 **Validate streaming data on ADLS Gen2**
 
 We are seeing the `click_events` streaming into ADLS Gen2
 
 :::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/validate-stream-azure-data-lake-storage-gen2-1.png" alt-text="Screenshot showing ADLS Gen2 output.":::
-:::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/validate-stream-azure-data-lake-storage-gen2-2.png" alt-text="Screenshot showing Flink click event output.":::
+:::Image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/validate-stream-azure-data-lake-storage-gen2-2.png" alt-text="Screenshot showing Flink click event output.":::
 
 You can specify a rolling policy that rolls the in-progress part file on any of the following three conditions:
 
 
@@ -3,7 +3,7 @@ title: Table API and SQL - Use Delta Catalog type with Hive with Apache Flink®
 description: Learn about how to create Delta Catalog with Apache Flink® on Azure HDInsight on AKS
 ms.service: hdinsight-aks
 ms.topic: how-to
-ms.date: 08/29/2023
+ms.date: 03/14/2024
 ---
 
 # Create Delta Catalog with Apache Flink® on Azure HDInsight on AKS
@@ -23,11 +23,12 @@ In this article, we learn how Apache Flink SQL/TableAPI is used to implement a D
 Once you launch the Secure Shell (SSH), let us start downloading the dependencies required to the SSH node, to illustrate the Delta table managed in Hive catalog.
 
    ```
- wget https://repo1.maven.org/maven2/io/delta/delta-standalone_2.12/3.0.0rc1/delta-standalone_2.12-3.0.0rc1.jar -P $FLINK_HOME/lib
- wget https://repo1.maven.org/maven2/io/delta/delta-flink/3.0.0rc1/delta-flink-3.0.0rc1.jar -P $FLINK_HOME/lib
- wget https://repo1.maven.org/maven2/com/chuusai/shapeless_2.12/2.3.4/shapeless_2.12-2.3.4.jar -P $FLINK_HOME/lib
- wget https://repo1.maven.org/maven2/org/apache/flink/flink-parquet/1.16.0/flink-parquet-1.16.0.jar -P $FLINK_HOME/lib
- wget https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop-bundle/1.12.2/parquet-hadoop-bundle-1.12.2.jar -P $FLINK_HOME/lib
+   wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.0.0/delta-storage-3.0.0.jar -P $FLINK_HOME/lib
+   wget https://repo1.maven.org/maven2/io/delta/delta-standalone_2.12/3.0.0/delta-standalone_2.12-3.0.0.jar -P $FLINK_HOME/lib
+   wget https://repo1.maven.org/maven2/io/delta/delta-flink/3.0.0/delta-flink-3.0.0.jar -P $FLINK_HOME/lib
+   wget https://repo1.maven.org/maven2/com/chuusai/shapeless_2.12/2.3.4/shapeless_2.12-2.3.4.jar -P $FLINK_HOME/lib
+   wget https://repo1.maven.org/maven2/org/apache/flink/flink-parquet/{flink.version}/flink-parquet-{flink.version}.jar -P $FLINK_HOME/lib
+   wget https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop-bundle/1.12.2/parquet-hadoop-bundle-1.12.2.jar -P $FLINK_HOME/lib 
    ```
 
 ### Start the Apache Flink SQL Client
@@ -51,11 +52,12 @@ Using the delta catalog
 #### Add dependencies to server classpath
 
 ```sql
-  ADD JAR '/opt/flink-webssh/lib/delta-flink-3.0.0rc1.jar';
-  ADD JAR '/opt/flink-webssh/lib/delta-standalone_2.12-3.0.0rc1.jar';
+  ADD JAR '/opt/flink-webssh/lib/delta-standalone_2.12-3.0.0.jar';
+  ADD JAR '/opt/flink-webssh/lib/delta-storage-3.0.0.jar';
+  ADD JAR '/opt/flink-webssh/lib/delta-flink-3.0.0.jar';
   ADD JAR '/opt/flink-webssh/lib/shapeless_2.12-2.3.4.jar';
   ADD JAR '/opt/flink-webssh/lib/parquet-hadoop-bundle-1.12.2.jar';
-  ADD JAR '/opt/flink-webssh/lib/flink-parquet-1.16.0.jar';
+  ADD JAR '/opt/flink-webssh/lib/flink-parquet-1.17.0.jar';
 ```
 #### Create Table
 
 
@@ -1,22 +1,22 @@
 ---
 title: Enrich the events from Apache Kafka® with the attributes from FileSystem with Apache Flink®
-description: Learn how to join stream from Kafka with table from fileSystem using Apache Flink® DataStream API
+description: Learn how to join stream from Kafka with table from fileSystem using Apache Flink® DataStream API.
 ms.service: hdinsight-aks
 ms.topic: how-to
-ms.date: 08/29/2023
+ms.date: 03/14/2024
 ---
 
 # Enrich the events from Apache Kafka® with attributes from ADLS Gen2 with Apache Flink®
 
 [!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
 
-In this article, you can learn how you can enrich the real time events by joining a stream from Kafka with table on ADLS Gen2 using Flink Streaming. We use Flink Streaming API to join events from HDInsight Kafka with attributes from ADLS Gen2, further we use attributes-joined events to sink into another Kafka topic.
+In this article, you can learn how you can enrich the real time events by joining a stream from Kafka with table on ADLS Gen2 using Flink Streaming. We use Flink Streaming API to join events from HDInsight Kafka with attributes from ADLS Gen2. Further we use attributes-joined events to sink into another Kafka topic.
 
 ## Prerequisites
 
 * [Flink cluster on HDInsight on AKS](../flink/flink-create-cluster-portal.md) 
 * [Kafka cluster on HDInsight](../../hdinsight/kafka/apache-kafka-get-started.md)
-    *  You're required to ensure the network settings are taken care as described on [Using Kafka on HDInsight](../flink/process-and-consume-data.md); that's to make sure HDInsight on AKS and HDInsight clusters are in the same VNet 
+    *  Ensure the network settings are taken care as described on [Using Kafka on HDInsight](../flink/process-and-consume-data.md) to make sure HDInsight on AKS and HDInsight clusters are in the same VNet 
 * For this demonstration, we're using a Window VM as maven project develop environment in the same VNet as HDInsight on AKS  
 
 ## Kafka topic preparation
@@ -45,7 +45,7 @@ We're creating a topic called `user_events`.
 
 ## Prepare file on ADLS Gen2
 
-We are creating a file called `item attributes` in our storage
+We're creating a file called `item attributes` in our storage
 
 - The purpose is to read a batch of `item attributes` from a file on ADLS Gen2. Each item has the following fields:
   ```
@@ -59,7 +59,7 @@ We are creating a file called `item attributes` in our storage
 
 ## Develop the Apache Flink job 
 
-In this step we perform the following activities
+In this step, we perform the following activities
 - Enrich the `user_events` topic from Kafka by joining with `item attributes` from a file on ADLS Gen2.
 - We push the outcome of this step, as an enriched user activity of events into a Kafka topic.
 
@@ -81,7 +81,7 @@ In this step we perform the following activities
     <properties>
         <maven.compiler.source>1.8</maven.compiler.source>
         <maven.compiler.target>1.8</maven.compiler.target>
-        <flink.version>1.16.0</flink.version>
+        <flink.version>1.17.0</flink.version>
         <java.version>1.8</java.version>
         <scala.binary.version>2.12</scala.binary.version>
         <kafka.version>3.2.0</kafka.version> //replace with 2.4.1 if you are using HDInsight Kafka 2.4.1
@@ -195,14 +195,19 @@ public class KafkaJoinGen2Demo {
         DataStream<String> kafkaData = env.fromSource(kafkaSource, WatermarkStrategy.noWatermarks(), "Kafka Source");
 
         // Parse Kafka source data
-        DataStream<Tuple4<String, String, String, String>> userEvents = kafkaData.map(new MapFunction<String, Tuple4<String, String, String, String>>() {
-            @Override
-            public Tuple4<String, String, String, String> map(String value) throws Exception {
-                // Parse the line into a Tuple4
-                String[] parts = value.split(",");
-                return new Tuple4<>(parts[0], parts[1], parts[2], parts[3]);
-            }
-        });
+      DataStream<Tuple4<String, String, String, String>> userEvents = kafkaData.map(new MapFunction<String, Tuple4<String, String, String, String>>() {
+          @Override
+          public Tuple4<String, String, String, String> map(String value) throws Exception {
+              // Parse the line into a Tuple4
+              String[] parts = value.split(",");
+              if (parts.length < 4) {
+                  // Log and skip malformed record
+                  System.out.println("Malformed record: " + value);
+                  return null;
+              }
+              return new Tuple4<>(parts[0], parts[1], parts[2], parts[3]);
+           }
+       });
 
         // 4. Enrich the user activity events by joining the items' attributes from a file
         DataStream<Tuple7<String,String,String,String,String,String,String>> enrichedData = userEvents.map(new MyJoinFunction());
@@ -254,7 +259,7 @@ public class KafkaJoinGen2Demo {
 }
 ```
 
-## Package jar and submit to Apache Flink
+## Package jar, and submit to Apache Flink
 
 We're submitting the packaged jar to Flink:
 
@@ -265,13 +270,13 @@ We're submitting the packaged jar to Flink:
 
 ### Produce real-time `user_events` topic on Kafka
 
- We are able to produce real-time user behavior event `user_events` in Kafka.
+ We're able to produce real-time user behavior event `user_events` in Kafka.
 
 :::image type="content" source="./media/join-stream-kafka-table-filesystem/step-5-kafka-3-2.png" alt-text="Screenshot showing a real-time user behavior event on Kafka 3.2." border="true" lightbox="./media/join-stream-kafka-table-filesystem/step-5-kafka-3-2.png":::
 
 ### Consume the `itemAttributes` joining with `user_events` on Kafka
 
-We are now using `itemAttributes` on filesystem join user activity events `user_events`.
+We're now using `itemAttributes` on filesystem join user activity events `user_events`.
 
 :::image type="content" source="./media/join-stream-kafka-table-filesystem/step-6-kafka-3-2.png" alt-text="Screenshot showing Consume the item attributes-joined user activity events on Kafka 3.2." border="true" lightbox="./media/join-stream-kafka-table-filesystem/step-6-kafka-3-2.png":::
 
 
@@ -3,7 +3,7 @@ title: Use Hive Catalog, Hive Read & Write demo on Apache Flink®
 description: Learn how to use Hive Catalog, Hive Read & Write demo on Apache Flink® on HDInsight on AKS
 ms.service: hdinsight-aks
 ms.topic: how-to
-ms.date: 10/27/2023
+ms.date: 03/18/2023
 ---
 
 # How to use Hive Catalog with Apache Flink® on HDInsight on AKS
@@ -143,19 +143,19 @@ mysql> desc orders;
 > Download the correct version jar according to our HDInsight kafka version and MySQL version.
 
 ```
-wget https://repo1.maven.org/maven2/org/apache/flink/flink-connector-jdbc/1.16.0/flink-connector-jdbc-1.16.0.jar
+wget https://repo1.maven.org/maven2/org/apache/flink/flink-connector-jdbc/3.1.0-1.17/flink-connector-jdbc-3.1.0-1.17.jar
 wget https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.33/mysql-connector-j-8.0.33.jar
 wget https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients/3.2.0/kafka-clients-3.2.0.jar
-wget https://repo1.maven.org/maven2/org/apache/flink/flink-connector-kafka/1.16.0/flink-connector-kafka-1.16.0.jar
+wget https://repo1.maven.org/maven2/org/apache/flink/flink-connector-kafka/1.17.0/flink-connector-kafka-1.17.0.jar
 ```
 
 **Moving the planner jar**
 
 Move the jar flink-table-planner_2.12-1.16.0-0.0.18.jar located in webssh pod's /opt to /lib and move out the jar flink-table-planner-loader-1.16.0-0.0.18.jar from /lib. Refer to [issue](https://issues.apache.org/jira/browse/FLINK-25128) for more details. Perform the following steps to move the planner jar.
 
 ```
-mv /opt/flink-webssh/opt/flink-table-planner_2.12-1.16.0-0.0.18.jar /opt/flink-webssh/lib/
-mv /opt/flink-webssh/lib/flink-table-planner-loader-1.16.0-0.0.18.jar /opt/flink-webssh/opt/
+mv /opt/flink-webssh/lib/flink-table-planner-loader-1.17.0-1.1.1.3.jar /opt/flink-webssh/opt/
+mv /opt/flink-webssh/opt/flink-table-planner_2.12-1.17.0-1.1.1.3.jar /opt/flink-webssh/lib/
 ```
 
 > [!NOTE]
@@ -165,7 +165,7 @@ mv /opt/flink-webssh/lib/flink-table-planner-loader-1.16.0-0.0.18.jar /opt/flink
 ### Use bin/sql-client.sh to connect to Flink SQL
 
 ``` 
-bin/sql-client.sh -j kafka-clients-3.2.0.jar -j flink-connector-kafka-1.16.0.jar -j flink-connector-jdbc-1.16.0.jar  -j mysql-connector-j-8.0.33.jar
+bin/sql-client.sh -j flink-connector-jdbc-3.1.0-1.17.jar -j mysql-connector-j-8.0.33.jar -j kafka-clients-3.2.0.jar -j flink-connector-kafka-1.17.0.jar
 ```
 
 ### Create Hive catalog and connect to the hive catalog on Flink SQL