diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/_index.md b/content/learning-paths/servers-and-cloud-computing/kafka-azure/_index.md new file mode 100644 index 0000000000..f9b2845871 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/kafka-azure/_index.md @@ -0,0 +1,60 @@ +--- +title: Deploy Kafka on the Microsoft Azure Cobalt 100 processors + +draft: true +cascade: + draft: true + +minutes_to_complete: 30 + +who_is_this_for: This Learning Path is designed for software developers looking to migrate their Kafka workloads from x86_64 to Arm-based platforms, specifically on the Microsoft Azure Cobalt 100 processors. + +learning_objectives: + - Provision an Azure Arm64 virtual machine using Azure console, with Ubuntu Pro 24.04 LTS as the base image. + - Deploy Kafka on the Ubuntu virtual machine. + - Perform Kafka baseline testing and benchmarking on both x86_64 and Arm64 virtual machines. + +prerequisites: + - A [Microsoft Azure](https://azure.microsoft.com/) account with access to Cobalt 100 based instances (Dpsv6). + - Basic understanding of Linux command line. + - Familiarity with the [Apache Kafka architecture](https://kafka.apache.org/) and deployment practices on Arm64 platforms. + +author: Jason Andrews + +### Tags +skilllevels: Advanced +subjects: Storage +cloud_service_providers: Microsoft Azure + +armips: + - Neoverse + +tools_software_languages: + - Kafka + - kafka-producer-perf-test.sh + - kafka-consumer-perf-test.sh + +operatingsystems: + - Linux + +further_reading: + - resource: + title: Kafka Manual + link: https://kafka.apache.org/documentation/ + type: documentation + - resource: + title: Kafka Performance Tool + link: https://codemia.io/knowledge-hub/path/use_kafka-producer-perf-testsh_how_to_set_producer_config_at_kafka_210-0820 + type: documentation + - resource: + title: Kafka on Azure + link: https://learn.microsoft.com/en-us/samples/azure/azure-quickstart-templates/kafka-ubuntu-multidisks/ + type: documentation + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/kafka-azure/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/kafka-azure/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/background.md b/content/learning-paths/servers-and-cloud-computing/kafka-azure/background.md new file mode 100644 index 0000000000..48990a4d0a --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/kafka-azure/background.md @@ -0,0 +1,20 @@ +--- +title: "Overview" + +weight: 2 + +layout: "learningpathall" +--- + +## Cobalt 100 Arm-based processor + +Azure’s Cobalt 100 is built on Microsoft's first-generation, in-house Arm-based processor: the Cobalt 100. Designed entirely by Microsoft and based on Arm’s Neoverse N2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads. These include web and application servers, data analytics, open-source databases, caching systems, and more. Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance. + +To learn more about Cobalt 100, refer to the blog [Announcing the preview of new Azure virtual machine based on the Azure Cobalt 100 processor](https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353). + +## Apache Kafka +Apache Kafka is a high-performance, open-source distributed event streaming platform designed for building real-time data pipelines and streaming applications. + +It allows you to publish, subscribe to, store, and process streams of records in a fault-tolerant and scalable manner. Kafka stores data in topics, which are partitioned and replicated across a cluster to ensure durability and high availability. + +Kafka is widely used for messaging, log aggregation, event sourcing, real-time analytics, and integrating large-scale data systems. Learn more from the [Apache Kafka official website](https://kafka.apache.org/) and its [official documentation](https://kafka.apache.org/documentation). diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/baseline.md b/content/learning-paths/servers-and-cloud-computing/kafka-azure/baseline.md new file mode 100644 index 0000000000..46453417d3 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/kafka-azure/baseline.md @@ -0,0 +1,104 @@ +--- +title: Baseline Testing +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Run a Baseline test with Kafka + +After installing Kafka on your Arm64 virtual machine, you can perform a simple baseline test to validate that Kafka runs correctly and produces the expected output. + +Kafka 4.1.0 uses **KRaft**, which integrates the control and data planes, eliminating the need for a separate ZooKeeper instance. + +We need 4 terminals to complete this test. The first will start the Kafka server, the second will create a topic, and the final two will send and receive messages, respectively. + +### Initial Setup: Configure & Format KRaft +**KRaft** is Kafka's new metadata protocol that integrates the responsibilities of ZooKeeper directly into Kafka, simplifying deployment and improving scalability by making the brokers self-managing. + +First, you must configure your `server.properties` file for KRaft and format the storage directory. These steps are done only once. + +**1. Edit the Configuration File**: Open your `server.properties` file. + +```console +nano /opt/kafka/config/server.properties +``` + +**2. Add/Modify KRaft Properties:** Ensure the following lines are present and correctly configured for a single-node setup. + +This configuration file sets up a single Kafka server to act as both a **controller** (managing cluster metadata) and a broker (handling data), running in **KRaft** mode. It defines the node's unique ID and specifies the local host as the sole participant in the **controller** quorum. + +```java +process.roles=controller,broker +node.id=1 +controller.quorum.voters=1@localhost:9093 +listeners=PLAINTEXT://:9092,CONTROLLER://:9093 +advertised.listeners=PLAINTEXT://localhost:9092 +log.dirs=/tmp/kraft-combined-logs +``` +**3. Format the Storage Directory:** Use the `kafka-storage.sh` tool to format the metadata directory. + +```console +bin/kafka-storage.sh format -t $(bin/kafka-storage.sh random-uuid) -c config/server.properties +``` +You should see an output similar to: + +```output +Formatting metadata directory /tmp/kraft-combined-logs with metadata.version 4.1-IV1. +``` + +Now, Perform the Baseline Test + +### Terminal 1 – Start Kafka Broker +This command starts the Kafka broker (the main server that sends and receives messages) in KRaft mode. Keep this terminal open. + +```console +cd /opt/kafka +bin/kafka-server-start.sh config/server.properties +``` +### Terminal 2 – Create a Topic +This command creates a new Kafka topic named `test-topic-kafka` (like a channel where messages will be stored and shared) with 1 partition and 1 copy (replica). + +```console +cd /opt/kafka +bin/kafka-topics.sh --create --topic test-topic-kafka --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1 +``` +You should see output similar to: + +```output +Created topic test-topic-kafka. +``` + +- **Verify topic** + +```console +bin/kafka-topics.sh --list --bootstrap-server localhost:9092 +``` +You should see output similar to: + +```output +__consumer_offsets +test-topic-kafka +``` + +### Terminal 3 – Console Producer (Write Message) +This command starts the **Kafka Producer**, which lets you type and send messages into the `test-topic-kafka` topic. For example, when you type `hello from azure vm`, this message will be delivered to any Kafka consumer subscribed to that topic. + +```console +cd /opt/kafka +bin/kafka-console-producer.sh --topic test-topic-kafka --bootstrap-server localhost:9092 +``` +You should see an empty prompt where you can start typing. Type `hello from azure arm vm` and press **Enter**. + +### Terminal 4 – Console Consumer (Read Message) +This command starts the **Kafka Consumer**, which listens to the `test-topic-kafka` topic and displays all messages from the beginning. + +```console +cd /opt/kafka +bin/kafka-console-consumer.sh --topic test-topic-kafka --from-beginning --bootstrap-server localhost:9092 +``` + +You should see your message `hello from azure arm vm` displayed in this terminal, confirming that the producer's message was successfully received. + +Now you can proceed to benchmarking Kafka’s performance on the Azure Cobalt 100 Arm virtual machine. diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/benchmarking.md b/content/learning-paths/servers-and-cloud-computing/kafka-azure/benchmarking.md new file mode 100644 index 0000000000..051663dc9a --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/kafka-azure/benchmarking.md @@ -0,0 +1,118 @@ +--- +title: Benchmarking with Official Kafka Tools +weight: 6 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Benchmark Kafka on Azure Cobalt 100 Arm-based instances and x86_64 instances + +Kafka’s official performance tools (**kafka-producer-perf-test.sh** and **kafka-consumer-perf-test.sh**) let you generate test workloads, measure message throughput, and record end-to-end latency. + +## Steps for Kafka Benchmarking + +Before starting the benchmark, ensure that the **Kafka broker** are already running in separate terminals. + +Now, open two new terminals—one for the **producer benchmark** and another for the **consumer benchmark**. + +### Terminal A - Producer Benchmark + +The producer benchmark measures how fast Kafka can send messages, reporting throughput and latency percentiles. + +```console +cd /opt/kafka +bin/kafka-producer-perf-test.sh \ + --topic test-topic-kafka \ + --num-records 1000000 \ + --record-size 100 \ + --throughput -1 \ + --producer-props bootstrap.servers=localhost:9092 +``` +You should see output similar to: + +```output +1000000 records sent, 252589.0 records/sec (24.09 MB/sec), 850.85 ms avg latency, 1219.00 ms max latency, 851 ms 50th, 1184 ms 95th, 1210 ms 99th, 1218 ms 99.9th. +``` +### Terminal B - Consumer benchmark + +The consumer benchmark measures how fast Kafka can read messages from the topic, reporting throughput and total messages consumed. + +```console +cd /opt/kafka +bin/kafka-consumer-perf-test.sh \ + --topic test-topic-kafka \ + --bootstrap-server localhost:9092 \ + --messages 1000000 \ + --timeout 30000 +``` +You should see output similar to: + +```output +start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec +2025-09-03 06:07:13:616, 2025-09-03 06:07:17:545, 95.3674, 24.2727, 1000001, 254517.9435, 3354, 575, 165.8564, 1739132.1739 +``` + +## Benchmark Results Table Explained: + +- **Messages Processed** – Total number of messages handled during the test. +- **Records/sec** – Rate of messages sent or consumed per second. +- **MB/sec** – Data throughput in megabytes per second. +- **Avg Latency (ms)** – Average delay in sending messages (producer only). +- **Max Latency (ms)** – Longest observed delay in sending messages (producer only). +- **50th (ms)** – Median latency (half the messages were faster, half slower). +- **95th (ms)** – Latency below which 95% of messages were delivered. +- **99th (ms)** – Latency below which 99% of messages were delivered. +- **99.9th (ms)** – Latency below which 99.9% of messages were delivered. + +## Benchmark summary on Arm64: +Here is a summary of benchmark results collected on an Arm64 **D4ps_v6 Ubuntu Pro 24.04 LTS virtual machine**. +### Consumer Performance Test +| Metric | Value | Unit | +|-----------------------------|-------------|---------------| +| Total Time Taken | 3.875 | Seconds | +| Data Consumed | 95.3674 | MB | +| Throughput (Data) | 24.6110 | MB/sec | +| Messages Consumed | 1,000,001 | Messages | +| Throughput (Messages) | 258,064.77 | Messages/sec | +| Rebalance Time | 3348 | Milliseconds | +| Fetch Time | 527 | Milliseconds | +| Fetch Throughput (Data) | 180.9629 | MB/sec | +| Fetch Throughput (Messages)| 1,897,535.10| Messages/sec | + +### Producer Performance Test +| Metric | Records Sent | Records/sec | Throughput | Average Latency | Maximum Latency | 50th Percentile Latency | 95th Percentile Latency | 99th Percentile Latency | 99.9th Percentile Latency | +|--------|--------------|-------------|------------|-----------------|-----------------|-------------------------|-------------------------|-------------------------|---------------------------| +| Value | 1,000,000 | 257,532.8 | 24.56 | 816.19 | 1237.00 | 799 | 1168 | 1220 | 1231 | +| Unit | Records | Records/sec | MB/sec | ms | ms | ms | ms | ms | ms | + +## Benchmark summary on x86_64: +Here is a summary of the benchmark results collected on x86_64 **D4s_v6 Ubuntu Pro 24.04 LTS virtual machine**. +### Consumer Performance Test +| Metric | Value | Unit | +|--------------------|-------------|---------------| +| Total Time Taken | 3.811 | Seconds | +| Data Consumed | 95.3674 | MB | +| Throughput (Data) | 25.0243 | MB/sec | +| Messages Consumed | 1,000,001 | Messages | +| Throughput (Messages) | 262,398.58 | Messages/sec | +| Rebalance Time | 3271 | Milliseconds | +| Fetch Time | 540 | Milliseconds | +| Fetch Throughput (Data) | 176.6064 | MB/sec | +| Fetch Throughput (Messages) | 1,851,853.70| Messages/sec | + +### Producer Performance Test +| Metric | Records Sent | Records/sec | Throughput | Average Latency | Maximum Latency | 50th Percentile Latency | 95th Percentile Latency | 99th Percentile Latency | 99.9th Percentile Latency | +|--------|--------------|-------------|------------|-----------------|-----------------|-------------------------|-------------------------|-------------------------|---------------------------| +| Value | 1,000,000 | 242,013.6 | 23.08 | 840.69 | 1351.00 | 832 | 1283 | 1330 | 1350 | +| Unit | Records | Records/sec | MB/sec | ms | ms | ms | ms | ms | ms | + +## Benchmark comparison insights +When comparing the results on Arm64 vs x86_64 virtual machines: + + +- The Kafka **consumer** achieved **25.02 MB/sec throughput**, processing ~**262K messages/sec** with fetch throughput exceeding **1.85M messages/sec**. +- The Kafka **producer** sustained **23.08 MB/sec throughput**, with an average latency of ~**841 ms** and peak latency of ~**1351 ms**. +- These results confirm stable Kafka performance on the **Azure Ubuntu Pro arm64 virtual machine**, validating its suitability for **baseline testing and benchmarking**. + +You have now benchmarked Kafka on an Azure Cobalt 100 Arm64 virtual machine and compared results with x86_64. diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/create-instance.md b/content/learning-paths/servers-and-cloud-computing/kafka-azure/create-instance.md new file mode 100644 index 0000000000..9571395aa2 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/kafka-azure/create-instance.md @@ -0,0 +1,50 @@ +--- +title: Create an Arm based cloud virtual machine using Microsoft Cobalt 100 CPU +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Introduction + +There are several ways to create an Arm-based Cobalt 100 virtual machine : the Microsoft Azure console, the Azure CLI tool, or using your choice of IaC (Infrastructure as Code). This guide will use the Azure console to create a virtual machine with Arm-based Cobalt 100 Processor. + +This learning path focuses on the general-purpose virtual machine of the D series. Please read the guide on [Dpsv6 size series](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/dpsv6-series) offered by Microsoft Azure. + +If you have never used the Microsoft Cloud Platform before, please review the microsoft [guide to Create a Linux virtual machine in the Azure portal](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal?tabs=ubuntu). + +#### Create an Arm-based Azure Virtual Machine + +Creating a virtual machine based on Azure Cobalt 100 is no different from creating any other virtual machine in Azure. To create an Azure virtual machine, launch the Azure portal and navigate to "Virtual Machines". +1. Select "Create", and click on "Virtual Machine" from the drop-down list. +2. Inside the "Basic" tab, fill in the Instance details such as "Virtual machine name" and "Region". +3. Choose the image for your virtual machine (for example, Ubuntu Pro 24.04 LTS) and select “Arm64” as the VM architecture. +4. In the “Size” field, click on “See all sizes” and select the D-Series v6 family of virtual machines. Select “D4ps_v6” from the list. + +![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/instance.png "Figure 1: Select the D-Series v6 family of virtual machines") + +5. Select "SSH public key" as an Authentication type. Azure will automatically generate an SSH key pair for you and allow you to store it for future use. It is a fast, simple, and secure way to connect to your virtual machine. +6. Fill in the Administrator username for your VM. +7. Select "Generate new key pair", and select "RSA SSH Format" as the SSH Key Type. RSA could offer better security with keys longer than 3072 bits. Give a Key pair name to your SSH key. +8. In the "Inbound port rules", select HTTP (80) and SSH (22) as the inbound ports. + +![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/instance1.png "Figure 2: Allow inbound port rules") + +9. Click on the "Review + Create" tab and review the configuration for your virtual machine. It should look like the following: + +![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/ubuntu-pro.png "Figure 3: Review and Create an Azure Cobalt 100 Arm64 VM") + +10. Finally, when you are confident about your selection, click on the "Create" button, and click on the "Download Private key and Create Resources" button. + +![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/instance4.png "Figure 4: Download Private key and Create Resources") + +11. Your virtual machine should be ready and running within no time. You can SSH into the virtual machine using the private key, along with the Public IP details. + +![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/final-vm.png "Figure 5: VM deployment confirmation in Azure portal") + +{{% notice Note %}} + +To learn more about Arm-based virtual machine in Azure, refer to “Getting Started with Microsoft Azure” in [Get started with Arm-based cloud instances](/learning-paths/servers-and-cloud-computing/csp/azure). + +{{% /notice %}} diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/deploy.md b/content/learning-paths/servers-and-cloud-computing/kafka-azure/deploy.md new file mode 100644 index 0000000000..ac9a3ad15c --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/kafka-azure/deploy.md @@ -0,0 +1,50 @@ +--- +title: Install Kafka +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Install Kafka on Azure Cobalt 100 + +This section walks you through installing latest version of Apache Kafka on an Ubuntu Pro 24.04 Arm virtual machine. You’ll download Kafka, extract it into `/opt`, configure permissions, and verify the installation by checking the installed version. + +Follow the below instructions to install Kafka on Ubuntu Pro 24.04 virtual machine. + +### Install Java + +Kafka requires Java to run. Install it by executing the following commands: +```console +sudo apt update +sudo apt install -y default-jdk +``` +### Download and Install Kafka + +This sequence of commands downloads Kafka version 4.1.0 to the `/opt` directory, extracts the tarball, renames the folder to kafka for simplicity, and sets ownership so the current user can access and manage the Kafka installation. It prepares the system for running Kafka without permission issues. + +```console +cd /opt +sudo curl -O https://archive.apache.org/dist/kafka/4.1.0/kafka_2.13-4.1.0.tgz +sudo tar -xvzf kafka_2.13-4.1.0.tgz +sudo mv kafka_2.13-4.1.0 kafka +sudo chown -R $USER:$USER kafka +``` +{{% notice Note %}} +Kafka [3.5.0 release announcement](https://kafka.apache.org/blog#apache_kafka_350_release_announcement) includes a significant number of new features and fixes, including improving Kafka Connect and MirrorMaker 2. They aren't Arm-specific, but can benefit all architectures, including Linux/Arm64. +The [Arm Ecosystem Dashboard](https://developer.arm.com/ecosystem-dashboard/) recommends Apache Kafka version 3.5.0 as the minimum recommended on Arm platforms. +{{% /notice %}} + +### Check installed Kafka version + +These commands navigate to the Kafka installation directory and check the installed Kafka version, confirming that Kafka has been successfully installed and is ready for use. +```console +cd /opt/kafka +bin/kafka-topics.sh --version +``` + +You should see an output similar to: +```output +4.1.0 +``` +Kafka installation is complete. You can now proceed with the baseline testing. diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/final-vm.png b/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/final-vm.png new file mode 100644 index 0000000000..5207abfb41 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/final-vm.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/instance.png b/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/instance.png new file mode 100644 index 0000000000..285cd764a5 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/instance.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/instance1.png b/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/instance1.png new file mode 100644 index 0000000000..b9d22c352d Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/instance1.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/instance4.png b/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/instance4.png new file mode 100644 index 0000000000..2a0ff1e3b0 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/instance4.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/ubuntu-pro.png b/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/ubuntu-pro.png new file mode 100644 index 0000000000..d54bd75ca6 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/kafka-azure/images/ubuntu-pro.png differ