MicrosoftDocs
diff --git a/‎articles/hdinsight-aks/flink/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2.md
Lines changed: 27 additions & 7 deletions b/‎articles/hdinsight-aks/flink/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2.md
Lines changed: 27 additions & 7 deletions
diff --git a/‎articles/hdinsight-aks/flink/flink-configuration-management.md
Lines changed: 45 additions & 57 deletions b/‎articles/hdinsight-aks/flink/flink-configuration-management.md
Lines changed: 45 additions & 57 deletions
diff --git a/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/app-mode.png
52.7 KB b/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/app-mode.png
52.7 KB
diff --git a/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/create-app-mode.png
347 KB b/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/create-app-mode.png
347 KB
diff --git a/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/enable-job-log.png
86.2 KB b/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/enable-job-log.png
86.2 KB
diff --git a/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/flink-ui.png
67 KB b/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/flink-ui.png
67 KB
diff --git a/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-flink-job.png
-140 KB b/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-flink-job.png
-140 KB
diff --git a/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui.png
-129 KB b/‎articles/hdinsight-aks/flink/media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui.png
-129 KB
diff --git a/‎articles/hdinsight-aks/flink/media/flink-configuration-management/configuration-page-revised.png
-92.9 KB b/‎articles/hdinsight-aks/flink/media/flink-configuration-management/configuration-page-revised.png
-92.9 KB
diff --git a/‎articles/hdinsight-aks/flink/media/flink-configuration-management/save-point.png
68.8 KB b/‎articles/hdinsight-aks/flink/media/flink-configuration-management/save-point.png
68.8 KB
@@ -16,7 +16,7 @@ Apache Flink uses file systems to consume and persistently store data, both for
 
 * [Apache Flink cluster on HDInsight on AKS ](../flink/flink-create-cluster-portal.md)
 * [Apache Kafka cluster on HDInsight](../../hdinsight/kafka/apache-kafka-get-started.md)
-  * You're  required to ensure the network settings are taken care as described on [Using Apache Kafka on HDInsight](../flink/process-and-consume-data.md); that's to make sure HDInsight on AKS and HDInsight clusters are in the same Virtual Network 
+  * You're  required to ensure the network settings taken care as described on [Using Apache Kafka on HDInsight](../flink/process-and-consume-data.md). Make sure HDInsight on AKS and HDInsight clusters are in the same Virtual Network.
 * Use MSI to access ADLS Gen2 
 * IntelliJ for development on an Azure VM in HDInsight on AKS Virtual Network 
 
@@ -126,8 +126,14 @@ public class KafkaSinkToGen2 {
     public static void main(String[] args) throws Exception {
         // 1. get stream execution env
         StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
+         
+        Configuration flinkConfig = new Configuration(); 
 
-        // 1. read kafka message as stream input, update your broker ip's
+         flinkConfig.setString("classloader.resolve-order", "parent-first"); 
+
+         env.getConfig().setGlobalJobParameters(flinkConfig);  
+
+        // 2. read kafka message as stream input, update your broker ip's
         String brokers = "<update-broker-ip>:9092,<update-broker-ip>:9092,<update-broker-ip>:9092";
         KafkaSource<String> source = KafkaSource.<String>builder()
                 .setBootstrapServers(brokers)
@@ -161,18 +167,32 @@ public class KafkaSinkToGen2 {
 
 ```
 
-**Submit the job on Flink Dashboard UI**
+**Package jar, and submit to Apache Flink.**
+
+1. Upload the jar to ABFS.
+
+    :::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/app-mode.png" alt-text="Screenshot showing Flink app mode screen." lightbox="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/app-mode.png":::
+
+
+1. Pass the job jar information in `AppMode` cluster creation.
+
+    :::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/create-app-mode.png" alt-text="Screenshot showing create app mode." lightbox="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/create-app-mode.png":::
+
+    > [!NOTE]
+    >  Make sure to add classloader.resolve-order as ‘parent-first’ and hadoop.classpath.enable as `true`
+
+1. Select Job Log aggregation to push job logs to storage account.
 
-We are using Maven to package a jar onto local and submitting to Flink, and using Kafka to sink into ADLS Gen2.
+    :::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/enable-job-log.png" alt-text="Screenshot showing how to enable job log." lightbox="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/enable-job-log.png":::
 
-:::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-flink-job.png" alt-text="Diagram showing how to submit Flink Job." border="true" lightbox="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-flink-job.png":::
+1. You can see the job running.
 
-:::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/submit-the-job-flink-ui.png" alt-text="Screenshot showing jar submission to Flink dashboard.":::
+    :::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/flink-ui.png" alt-text="Screenshot showing Flink UI." lightbox="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/flink-ui.png":::
 
 
 **Validate streaming data on ADLS Gen2**
 
-We are seeing the `click_events` streaming into ADLS Gen2
+We're seeing the `click_events` streaming into ADLS Gen2.
 
 :::image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/validate-stream-azure-data-lake-storage-gen2-1.png" alt-text="Screenshot showing ADLS Gen2 output.":::
 :::Image type="content" source="./media/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2/validate-stream-azure-data-lake-storage-gen2-2.png" alt-text="Screenshot showing Flink click event output.":::
 
@@ -1,75 +1,75 @@
 ---
 title: Apache Flink® Configuration Management in HDInsight on AKS
-description: Learn about Apache Flink Configuration Management in HDInsight on AKS
+description: Learn about Apache Flink Configuration Management in HDInsight on AKS.
 ms.service: hdinsight-aks
 ms.topic: tutorial
-ms.date: 08/29/2023
+ms.date: 04/25/2024
 ---
 
 # Apache Flink® Configuration management in HDInsight on AKS
 
 [!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
 
-HDInsight on AKS provides a set of default configurations of Apache Flink for most properties and a few based on common application profiles. However, in case you're required to tweak Flink configuration properties to improve performance for certain applications with state usage, parallelism, or memory settings, you can change certain properties at cluster level using **Configuration management** section in HDInsight on AKS cluster.
+HDInsight on AKS provides a set of default configurations of Apache Flink for most properties and a few based on common application profiles. However, in case you're required to tweak Flink configuration properties to improve performance for certain applications with state usage, parallelism, or memory settings, you can change Flink job configuration using Flink Jobs Section in HDInsight on AKS cluster.
 
-1. Go to **Configuration Management** section on your Apache Flink cluster page 
+1. Go To Settings > Flink Jobs > Click on Update.
 
-    :::image type="content" source="./media/flink-configuration-management/configuration-page-revised.png" alt-text="Screenshot showing Apache Flink Configuration Management page." lightbox="./media/flink-configuration-management/configuration-page-revised.png":::
+    :::image type="content" source="./media/flink-configuration-management/update-page.png" alt-text="Screenshot showing update page." lightbox="./media/flink-configuration-management/update-page.png":::
 
-2. Update **configurations** as required at *Cluster level*
+1. Click on **+ Add a row** to edit configuration.
 
-    :::image type="content" source="./media/flink-configuration-management/update-configuration-revised.png" alt-text="Screenshot showing Apache Flink Update configuration page." lightbox="./media/flink-configuration-management/update-configuration-revised.png":::
+    :::image type="content" source="./media/flink-configuration-management/update-job.png" alt-text="Screenshot update job." lightbox="./media/flink-configuration-management/update-job.png":::
 
-Here the checkpoint interval is changed at *Cluster level*.
-
-3. Update the changes by clicking **OK** and then **Save**.
-
-Once saved, the new configurations get updated in a few minutes (~5 minutes).
-
-Configurations, which can be updated using Configuration Management Settings
-
-`processMemory size:`
-
-The default settings for the process memory size of or job manager and task manager would be the memory configured by the user during cluster creation. 
-
-This size can be configured by using the below configuration property. In-order to change task manager process memory, use this configuration
-
-`taskmanager.memory.process.size : <value>`
-
-Example:
-`taskmanager.memory.process.size : 2000mb`
-
-For job manager,
-
-`jobmanager.memory.process.size : <value>`
-
-> [!NOTE]
-> The maximum configurable process memory is equal to the memory configured for `jobmanager/taskmanager`. 
+    Here the checkpoint interval is changed at *Cluster level*.
+
+1. Update the changes by clicking **OK** and then **Save**.
+
+1. Once saved, the new configurations get updated in a few minutes (~5 minutes).
+    
+1. Configurations, which can be updated using Configuration Management Settings.
+    
+    `processMemory size:`
+    
+1. The default settings for the process memory size of or job manager and task manager would be the memory configured by the user during cluster creation. 
+    
+1. This size can be configured by using the below configuration property. In-order to change task manager process memory, use this configuration.
+    
+   `taskmanager.memory.process.size : <value>`
+    
+   Example:
+    `taskmanager.memory.process.size : 2000mb`
+    
+1. For job manager
+    
+    `jobmanager.memory.process.size : <value>`
+    
+    > [!NOTE]
+    > The maximum configurable process memory is equal to the memory configured for `jobmanager/taskmanager`. 
 
 ## Checkpoint Interval
 
-The checkpoint interval determines how often Flink triggers a checkpoint. it's defined in milliseconds and can be set using the following configuration property:
+The checkpoint interval determines how often Flink triggers a checkpoint. Defined in milliseconds and can be set using the following configuration property
 
 `execution.checkpoint.interval: <value>`
 
 Default setting is 60,000 milliseconds (1 min), this value can be changed as desired.
 
 ## State Backend
 
-The state backend determines how Flink manages and persists the state of your application. It impacts how checkpoints are stored. You can configure the `state backend using the following property:
+The state backend determines how Flink manages and persists the state of your application. It impacts how checkpoints stored. You can configure the `state backend using the following property:
 
 `state.backend: <value>`
 
-By default Apache Flink clusters in HDInsight on AKS use Rocks db
+By default Apache Flink clusters in HDInsight on AKS use Rocks DB.
 
 ## Checkpoint Storage Path
 
 We allow persistent checkpoints by default by storing the checkpoints in `abfs` storage as configured by the user. Even if the job fails, since the checkpoints are persisted, it can be easily started with the latest checkpoint.
 
 `state.checkpoints.dir: <path>`
-Replace `<path>` with the desired path where the checkpoints are stored.
+Replace `<path>` with the desired path where the checkpoints stored.
 
-By default, it's stored in the storage account (ABFS), configured by the user. This value can be changed to any path desired as long as the Flink pods can access it.
+By default, stored in the storage account (ABFS), configured by the user. This value can be changed to any path desired as long as the Flink pods can access it.
 
 ## Maximum Concurrent Checkpoints
 
@@ -88,40 +88,28 @@ Replace `<value>` with desired maximum number. By default we retain maximum five
 
 We allow persistent savepoints by default by storing the savepoints in `abfs` storage (as configured by the user). If the user wants to stop and later start the job with a particular savepoint, they can configure this location.
 state.checkpoints.dir: `<path>`
-Replace` <path>` with the desired path where the savepoints are stored.
-By default, it's stored in the storage account, configured by the user. (We support ABFS). This value can be changed to any path desired as long as the Flink pods can access it.
+Replace` <path>` with the desired path where the savepoints stored.
+By default, stored in the storage account, configured by the user. (We support ABFS). This value can be changed to any path desired as long as the Flink pods can access it.
 
 ## Job manager high availability
 
-In HDInsight on AKS, Flink uses Kubernetes as backend. Even if the Job Manager fails in between due to any known/unknown issue, the pod is restarted within a few seconds. Hence, even if the job restarts due to this issue, the job is recovered back from the **latest checkpoint**. 
+In HDInsight on AKS, Flink uses Kubernetes as backend. Even if the Job Manager fails in between due to any known/unknown issue, the pod is restarted within a few seconds. Hence, even if the job restarts due to this issue, the job is recovered back from the **latest checkpoint**.
 
 ### FAQ
 
-**Why does the Job failure in between 
+**Why does the Job failure in between.
 Even if the jobs fail abruptly, if the checkpoints are happening continuously, then the job is restarted by default from the latest checkpoint.** 
 
 Change the job strategy in between?
 There are use cases, where the job needs to be modified while in production due to some job level bug. During that time, the user can stop the job, which would automatically take a savepoint and save it in savepoint location.
 
-`bin/flink stop <JOBID>`
-
-Example:
-
-```
-root [ ~ ]# ./bin/flink stop 60bdf21d9bc3bc65d63bc3d8fc6d5c54
-Suspending job "60bdf21d9bc3bc65d63bc3d8fc6d5c54" with a CANONICAL savepoint.
-Savepoint completed. Path: abfs://[email protected]/8255a11812144c28b4ddf1068460c96b/savepoints/savepoint-60bdf2-7717485d15e3
-```
+1. Click on `savepoint` and wait for `savepoint` to be completed.
 
-Later the user can start the job  with bug fix pointing to the savepoint.
+    :::image type="content" source="./media/flink-configuration-management/save-point.png" alt-text="Screenshot showing save point options." lightbox="./media/flink-configuration-management/save-point.png":::
 
-```
-./bin/flink run <JOB_JAR> -d <SAVEPOINT_LOC>
-root [ ~ ]# ./bin/flink run examples/streaming/StateMachineExample.jar -s abfs://[email protected]/8255a11812144c28b4ddf1068460c96b/savepoints/savepoint-60bdf2-7717485d15e3
-```
-Usage with built-in data generator: StateMachineExample [--error-rate `<probability-of-invalid-transition>] [--sleep <sleep-per-record-in-ms>]`
+1. After savepoint completion, click on start and Start Job Tab will appear. Select the savepoint name from the dropdown. Edit any configurations if necessary. And click **OK**.  
 
-Usage with Kafka: `StateMachineExample --kafka-topic <topic> [--brokers <brokers>]`
+    :::image type="content" source="./media/flink-configuration-management/start-job.png" alt-text="Screenshot showing how to start job." lightbox="./media/flink-configuration-management/start-job.png":::
 
 Since savepoint is provided in the job, the Flink knows from where to start processing the data.