ArmDeveloperEcosystem · jasonrandrews · Apr 23, 2025 · Apr 21, 2025
diff --git a/content/learning-paths/cross-platform/adler32/_index.md b/content/learning-paths/cross-platform/adler32/_index.md
@@ -0,0 +1,60 @@
+---
+title: Write NEON intrinsics using GitHub Copilot to improve Adler32 performance
+
+draft: true
+cascade: 
+    draft: true
+
+minutes_to_complete: 45
+
+who_is_this_for: This is an introductory topic for C/C++ developers who are interested in using GitHub Copilot to improve performance using NEON intrinsics.
+
+learning_objectives: 
+    - Use GitHub Copilot to write NEON intrinsics to improve performance of the Adler32 checksum algorithm.
+
+prerequisites:
+    - An Arm computer running Linux with the GNU compiler (gcc) installed.
+    - VS Code with GitHub Copilot installed. 
+
+author: Jason Andrews
+
+### Tags
+skilllevels: Introductory
+subjects: Performance and Architecture
+armips:
+    - Neoverse
+    - Cortex-A
+tools_software_languages:
+    - GCC
+    - Runbook
+
+operatingsystems:
+    - Linux
+shared_path: true
+shared_between:
+    - servers-and-cloud-computing
+    - laptops-and-desktops
+    - mobile-graphics-and-gaming
+
+
+further_reading:
+    - resource:
+        title: Arm C Language Extensions
+        link: https://arm-software.github.io/acle/
+        type: Documentation
+    - resource:
+        title: Adler-32 Checksum Algorithm
+        link: https://en.wikipedia.org/wiki/Adler-32
+        type: Article
+    - resource:
+        title: NEON Programming Quick Reference
+        link: https://developer.arm.com/documentation/den0018/a
+        type: Documentation
+
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/cross-platform/adler32/_next-steps.md b/content/learning-paths/cross-platform/adler32/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/cross-platform/adler32/about-2.md b/content/learning-paths/cross-platform/adler32/about-2.md
@@ -0,0 +1,52 @@
+---
+title: About NEON and Adler32
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Introduction
+
+In computing, optimizing performance is crucial for applications that process large amounts of data. This Learning Path focuses on implementing and optimizing the Adler32 checksum algorithm using Arm advanced SIMD (Single Instruction, Multiple Data) capabilities. You'll learn how to leverage GitHub Copilot to simplify the development process while achieving significant performance improvements.
+
+## Simplifying Arm NEON Development with GitHub Copilot
+
+Developers recognize that Arm NEON SIMD instructions can significantly boost performance for computationally intensive applications, particularly in areas like image processing, audio/video codecs, and machine learning. However, writing NEON intrinsics directly requires specialized knowledge of the instruction set, careful consideration of data alignment, and complex vector operations that can be error-prone and time-consuming. Many developers avoid implementing these optimizations due to the steep learning curve and development overhead.
+
+The good news is that AI developer tools such as GitHub Copilot make working with NEON intrinsics much more accessible. By providing intelligent code suggestions, automated vectorization hints, and contextual examples tailored to your specific use case, GitHub Copilot can help bridge the knowledge gap and accelerate the development of NEON-optimized code. This allows developers to harness the full performance potential of Arm processors without the traditional complexity and time-consuming effort.
+
+Writing NEON intrinsics with GitHub Copilot can be demonstrated by creating a complete project from scratch, and comparing the C implementation with the NEON implementation.
+
+While you may not create complete projects from scratch, and you shouldn't blindly trust the generated code, it's helpful to see what's possible using an example so you can apply the principles to your own projects.
+
+## Accelerating Adler32 Checksum with Arm NEON Instructions
+
+This project demonstrates how to significantly improve the performance of Adler32 checksum calculations using Arm NEON instructions.
+
+### What is Arm NEON?
+
+Arm NEON is an advanced SIMD architecture extension for Arm processors. It provides a set of instructions that can process multiple data elements in parallel using specialized vector registers. NEON technology enables developers to accelerate computationally intensive algorithms by performing the same operation on multiple data points simultaneously, rather than processing them one at a time. This parallelism is particularly valuable for multimedia processing, scientific calculations, and cryptographic operations where the same operation needs to be applied to large datasets.
+
+## What is Adler32?
+
+Adler32 is a checksum algorithm that was invented by Mark Adler in 1995. It's used in the zlib compression library and is faster than CRC32 but provides less reliable error detection.
+
+The algorithm works by calculating two 16-bit sums:
+
+- s1: A simple sum of all bytes
+- s2: A sum of all s1 values after each byte
+- The final checksum is (s2 << 16) | s1.
+
+## Project Overview
+
+This project explains how you can use GitHub Copilot to create everything listed below:
+
+- Standard C implementation of Adler32
+- Test program to confirm Adler32 works correctly for inputs of various sizes
+- Makefile to build and run the program
+- Performance measurement code to record how long the algorithm takes
+- NEON version of Adler32 to increase performance
+- Tables showing performance comparison between the standard C version and the NEON version
+
+Continue to the next section to start creating the project.
diff --git a/content/learning-paths/cross-platform/adler32/build-6.md b/content/learning-paths/cross-platform/adler32/build-6.md
@@ -0,0 +1,62 @@
+---
+title: Build and run the test program
+weight: 6
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## How can I test the build and run? 
+
+The required files are now complete to test the Adler32 algorithm.
+- Adler32 C function
+- Test program to call the Adler32 function to test for correctness and measure performance
+- Makefile to build and run
+
+Copy the information below to your GitHub Copilot Agent session:
+
+```console
+Use the Makefile to build the project and run to make sure the checksum results are correct for all data sizes.
+```
+
+The output is:
+
+```output
+
+GitHub Copilot: I'll help you build and run the Adler-32 checksum project to verify the results. Let me do this step-by-step.
+
+First, let's build the project using the provided Makefile:Great! The build completed successfully without any errors or warnings. The Makefile compiled both source files (`adler32-simple.c` and adler32-test.c) and linked them into the executable adler32-test.
+
+Now let's run the tests to verify that the Adler-32 checksum implementation works correctly for all data sizes:I ran the Adler-32 checksum project, and here are the results:
+
+The build completed successfully without any errors or warnings. The Makefile correctly compiled both source files (`adler32-simple.c` and adler32-test.c) and linked them into the executable adler32-test.
+
+When running the tests, I got the following results:
+
+### Verification Tests
+All 6 verification tests passed successfully:
+1. Empty string: PASS
+2. "a": PASS
+3. "abc": PASS
+4. "message digest": PASS
+5. "abcdefghijklmnopqrstuvwxyz": PASS
+6. "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789": PASS
+
+### Incremental API Test
+The incremental API test also passed, showing that both direct and incremental calculation methods produce the same checksum (0x3d094d1a).
+
+### Performance Tests
+The implementation was benchmarked with different data sizes:
+- 1 KB: 195.31 MB/s
+- 10 KB: 128.50 MB/s
+- 100 KB: 261.11 MB/s
+- 1 MB: 297.80 MB/s
+- 10 MB: 298.17 MB/s
+
+The results confirm that your Adler-32 checksum implementation is correct for all data sizes and performs efficiently. The optimizations for the Neoverse N1 architecture specified in the Makefile appear to be working properly, with throughput reaching nearly 300 MB/s for larger data sizes.
+
+```
+
+The results from GitHub Copilot explain that the Adler32 checksum calculations are correct and give some initial performance results. The results don't mean much yet as there is nothing to compare with. 
+
+Continue to the next section to implement Adler32 using NEON and compare the performance.
diff --git a/content/learning-paths/cross-platform/adler32/makefile-5.md b/content/learning-paths/cross-platform/adler32/makefile-5.md
@@ -0,0 +1,114 @@
+---
+title: Create a Makefile 
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## How can I create a Makefile to build and run the test program?
+
+To create a Makefile, copy and paste the information below to GitHub Copilot. The prompt explains that the Makefile should use `gcc` as the compiler and target the Neoverse N1 processor. 
+
+```console
+Read the .c files in my project and 
+create a Makefile to build and run the adler32-test program using gcc. 
+Select the best compiler flags for Neoverse N1 to optimize for performance.
+```
+
+The output is:
+
+```output
+
+GitHub Copilot: I'll create a Makefile for your Adler-32 project that's optimized for the Neoverse N1 architecture. First, let's look at the project structure and understand the code files better.I see that the file search didn't return results. Let's try again with a different approach.I can see the project contains three files: adler32-simple.c, adler32-simple.h, and adler32-test.c. Let's examine the code in adler32-simple.c to understand the implementation before creating the Makefile.I've examined the code files in your project:
+
+1. adler32-simple.c - Contains the implementation of the Adler-32 checksum algorithm with functions for initialization, update, and finalization
+2. adler32-simple.h - Presumably contains the function declarations
+3. adler32-test.c - Contains test code for verifying and benchmarking the Adler-32 implementation
+
+Now I'll create a Makefile optimized for performance on the Neoverse N1 architecture. The Neoverse N1 is an ARM architecture that benefits from specific compiler optimizations.
+
+# Makefile for Adler-32 checksum implementation
+# Optimized for Neoverse N1 architecture
+
+# Compiler and flags
+CC = gcc
+# Neoverse N1 specific optimization flags:
+# -march=armv8.2-a+crypto: Target ARMv8.2-A architecture with crypto extensions (Neoverse N1)
+# -mcpu=neoverse-n1: Specifically optimize for Neoverse N1
+# -O3: Highest standard optimization level
+# -flto: Link-time optimization
+# -fomit-frame-pointer: Don't keep the frame pointer in a register for functions that don't need one
+# -ftree-vectorize: Vectorize loops when possible
+# -ffast-math: Enable fast math optimizations
+# Additional useful flags:
+# -Wall -Wextra: Enable warnings
+# -g: Include debugging symbols
+CFLAGS = -march=armv8.2-a+crypto -mcpu=neoverse-n1 -O3 -flto -fomit-frame-pointer -ftree-vectorize -ffast-math -Wall -Wextra -g
+
+# Source files
+SRCS = adler32-simple.c adler32-test.c
+OBJS = $(SRCS:.c=.o)
+TARGET = adler32-test
+
+# Default target
+all: $(TARGET)
+
+# Link the executable
+$(TARGET): $(OBJS)
+	$(CC) $(CFLAGS) -o $@ $^
+
+# Compile source files
+%.o: %.c
+	$(CC) $(CFLAGS) -c $< -o $@
+
+# Run the tests
+run: $(TARGET)
+	./$(TARGET)
+
+# Clean generated files
+clean:
+	rm -f $(OBJS) $(TARGET)
+
+# Rebuild the project
+rebuild: clean all
+
+.PHONY: all run clean rebuild
+
+
+Made changes.
+
+I've created a Makefile for your Adler-32 project, optimized specifically for the Neoverse N1 architecture. Let me highlight some key features of this Makefile:
+
+### Optimization Flags for Neoverse N1
+
+The Makefile includes several compiler flags tailored for maximum performance on the Neoverse N1:
+
+- `-march=armv8.2-a+crypto`: Targets ARMv8.2-A architecture with crypto extensions
+- `-mcpu=neoverse-n1`: Specifically optimizes for the Neoverse N1 processor
+- `-O3`: Enables the highest standard optimization level
+- `-flto`: Enables link-time optimization for whole-program optimization
+- `-fomit-frame-pointer`: Frees up a register by not keeping frame pointers when not needed
+- `-ftree-vectorize`: Automatically vectorizes loops to leverage SIMD instructions
+- `-ffast-math`: Enables optimizations that might affect floating-point precision but improve performance
+
+### Building and Testing
+
+To use the Makefile, you can run the following commands:
+
+1. To build the project:
+   make
+
+2. To run the tests after building:
+   make run
+
+3. To clean the build files:
+   make clean
+
+4. To rebuild everything from scratch:
+   make rebuild
+
+The Makefile is designed to be simple but effective, automatically handling dependencies and providing different targets for common development tasks.
+```
+
+You are now ready to use the Makefile to build and run the test application.
diff --git a/content/learning-paths/cross-platform/adler32/more-11.md b/content/learning-paths/cross-platform/adler32/more-11.md
@@ -0,0 +1,18 @@
+---
+title: Other ideas for GitHub Copilot
+weight: 11
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## What else can I do with GitHub Copilot on this project?
+
+You can investigate more topics using GitHub Copilot.
+
+- Direct GitHub Copilot to try different compiler flags and use Agent mode to iterate through the options to find the best solution. 
+- Add support for the Clang compiler to the Makefile and compare the results to GCC. Depending on the application code, changing the compiler can result in improved performance.
+- Use GitHub Copilot to generate different data sizes and random data patterns to further investigate correct functionality and performance.
+- Try different algorithm implementations that use compiler autovectorization instead of NEON intrinsics or break down the Adler32 checksum into smaller blocks of data. It may be possible to get even better performance without NEON using the compiler and a better structure for the C code.
+
+While AI tools do not create performance code for every programming problem, they can be a big help to get you started in a new area of programming such as performance optimization using NEON intrinsics.