Skip to content

Commit 5526d04

Browse files
Merge pull request #1874 from madeline-underwood/neon
neon copilot_JA to sign off
2 parents 7103c0e + 7b8fc79 commit 5526d04

File tree

11 files changed

+77
-64
lines changed

11 files changed

+77
-64
lines changed

content/learning-paths/cross-platform/adler32/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,11 @@ minutes_to_complete: 45
1010
who_is_this_for: This is an introductory topic for C/C++ developers who are interested in using GitHub Copilot to improve performance using NEON intrinsics.
1111

1212
learning_objectives:
13-
- Use GitHub Copilot to write NEON intrinsics to improve performance of the Adler32 checksum algorithm.
13+
- Use GitHub Copilot to write NEON intrinsics that accelerate the Adler32 checksum algorithm.
1414

1515
prerequisites:
1616
- An Arm computer running Linux with the GNU compiler (gcc) installed.
17-
- VS Code with GitHub Copilot installed.
17+
- Visual Studio Code with the GitHub Copilot extension installed.
1818

1919
author: Jason Andrews
2020

content/learning-paths/cross-platform/adler32/about-2.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -6,47 +6,47 @@ weight: 2
66
layout: learningpathall
77
---
88

9-
## Introduction
9+
## Overview
1010

11-
In computing, optimizing performance is crucial for applications that process large amounts of data. This Learning Path focuses on implementing and optimizing the Adler32 checksum algorithm using Arm advanced SIMD (Single Instruction, Multiple Data) capabilities. You'll learn how to leverage GitHub Copilot to simplify the development process while achieving significant performance improvements.
11+
In computing, optimizing performance is crucial for applications that process large amounts of data. This Learning Path guides you through implementing and optimizing the Adler32 checksum algorithm using Arm advanced SIMD (Single Instruction, Multiple Data) instructions. You'll learn how to leverage GitHub Copilot to simplify the development process while achieving significant performance improvements.
1212

1313
## Simplifying Arm NEON Development with GitHub Copilot
1414

1515
Developers recognize that Arm NEON SIMD instructions can significantly boost performance for computationally intensive applications, particularly in areas like image processing, audio/video codecs, and machine learning. However, writing NEON intrinsics directly requires specialized knowledge of the instruction set, careful consideration of data alignment, and complex vector operations that can be error-prone and time-consuming. Many developers avoid implementing these optimizations due to the steep learning curve and development overhead.
1616

17-
The good news is that AI developer tools such as GitHub Copilot make working with NEON intrinsics much more accessible. By providing intelligent code suggestions, automated vectorization hints, and contextual examples tailored to your specific use case, GitHub Copilot can help bridge the knowledge gap and accelerate the development of NEON-optimized code. This allows developers to harness the full performance potential of Arm processors without the traditional complexity and time-consuming effort.
17+
The good news is that AI developer tools such as GitHub Copilot make working with NEON intrinsics much more accessible. By providing intelligent code suggestions, automated vectorization hints, and contextual examples tailored to your specific use case, GitHub Copilot can help bridge the knowledge gap and accelerate the development of NEON-optimized code. This allows developers to harness the full performance potential of Arm processors - without the usual complexity and overhead.
1818

19-
Writing NEON intrinsics with GitHub Copilot can be demonstrated by creating a complete project from scratch, and comparing the C implementation with the NEON implementation.
19+
You can demonstrate writing NEON intrinsics with GitHub Copilot by creating a full project from scratch and comparing the C implementation to a NEON-optimized version.
2020

21-
While you may not create complete projects from scratch, and you shouldn't blindly trust the generated code, it's helpful to see what's possible using an example so you can apply the principles to your own projects.
21+
While you may not create complete projects from scratch - and you shouldn't blindly trust the generated code - it's helpful to see what's possible using an example so you can apply the principles to your own projects.
22+
23+
## Accelerating Adler32 with Arm NEON
2224

23-
## Accelerating Adler32 Checksum with Arm NEON Instructions
24-
25-
This project demonstrates how to significantly improve the performance of Adler32 checksum calculations using Arm NEON instructions.
25+
This project demonstrates how to accelerate Adler32 checksum calculations using Arm NEON instructions.
2626

2727
### What is Arm NEON?
2828

2929
Arm NEON is an advanced SIMD architecture extension for Arm processors. It provides a set of instructions that can process multiple data elements in parallel using specialized vector registers. NEON technology enables developers to accelerate computationally intensive algorithms by performing the same operation on multiple data points simultaneously, rather than processing them one at a time. This parallelism is particularly valuable for multimedia processing, scientific calculations, and cryptographic operations where the same operation needs to be applied to large datasets.
3030

31-
## What is Adler32?
31+
## What Is the Adler32 Algorithm?
3232

33-
Adler32 is a checksum algorithm that was invented by Mark Adler in 1995. It's used in the zlib compression library and is faster than CRC32 but provides less reliable error detection.
33+
Mark Adler developed the Adler32 checksum algorithm in 1995. It's used in the zlib compression library and is faster than CRC32 but provides less reliable error detection.
3434

3535
The algorithm works by calculating two 16-bit sums:
3636

37-
- s1: A simple sum of all bytes
38-
- s2: A sum of all s1 values after each byte
39-
- The final checksum is (s2 << 16) | s1.
37+
- s1: A simple sum of all bytes.
38+
- s2: A sum of all s1 values after each byte.
39+
- The final checksum is `(s2 << 16) | s1`.
4040

41-
## Project Overview
41+
## What You'll Build
4242

43-
This project explains how you can use GitHub Copilot to create everything listed below:
43+
This project walks you through building the following components using GitHub Copilot:
4444

45-
- Standard C implementation of Adler32
46-
- Test program to confirm Adler32 works correctly for inputs of various sizes
47-
- Makefile to build and run the program
48-
- Performance measurement code to record how long the algorithm takes
49-
- NEON version of Adler32 to increase performance
50-
- Tables showing performance comparison between the standard C version and the NEON version
45+
- A standard C implementation of Adler32.
46+
- A test program to validate outputs for various input sizes.
47+
- A Makefile to build and run the program.
48+
- Performance measurement code to record how long the algorithm takes.
49+
- A NEON-optimized version of Adler32.
50+
- A performance comparison table for both implementations.
5151

5252
Continue to the next section to start creating the project.

content/learning-paths/cross-platform/adler32/build-6.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,14 @@ weight: 6
66
layout: learningpathall
77
---
88

9-
## How can I test the build and run?
9+
## How Can I Build and Run the Test Program?
1010

11-
The required files are now complete to test the Adler32 algorithm.
12-
- Adler32 C function
13-
- Test program to call the Adler32 function to test for correctness and measure performance
14-
- Makefile to build and run
11+
You now have all the required files to test the Adler32 algorithm:
12+
- A C implementation of the Adler32 function.
13+
- A test program to verify correctness and measure performance.
14+
- A Makefile to build and run the project.
1515

16-
Copy the information below to your GitHub Copilot Agent session:
16+
Paste the following prompt into your GitHub Copilot Agent session:
1717

1818
```console
1919
Use the Makefile to build the project and run to make sure the checksum results are correct for all data sizes.
@@ -57,6 +57,6 @@ The results confirm that your Adler-32 checksum implementation is correct for al
5757
5858
```
5959

60-
The results from GitHub Copilot explain that the Adler32 checksum calculations are correct and give some initial performance results. The results don't mean much yet as there is nothing to compare with.
60+
The results from GitHub Copilot confirm that the Adler32 checksum calculations are correct and provide initial performance benchmarks. These results offer a solid baseline, but a meaningful comparison requires an optimized implementation.
6161

62-
Continue to the next section to implement Adler32 using NEON and compare the performance.
62+
In the next section, you’ll implement Adler32 using NEON intrinsics and compare its performance against this baseline.

content/learning-paths/cross-platform/adler32/makefile-5.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ weight: 5
66
layout: learningpathall
77
---
88

9-
## How can I create a Makefile to build and run the test program?
9+
## How Can I Create a Makefile to Build and Run the Test Program?
1010

11-
To create a Makefile, copy and paste the information below to GitHub Copilot. The prompt explains that the Makefile should use `gcc` as the compiler and target the Neoverse N1 processor.
11+
Paste the following prompt into GitHub Copilot. It tells Copilot to generate a Makefile that uses `gcc` and targets the Neoverse N1 processor for optimized performance.
1212

1313
```console
1414
Read the .c files in my project and

content/learning-paths/cross-platform/adler32/more-11.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ layout: learningpathall
88

99
## What else can I do with GitHub Copilot on this project?
1010

11-
You can investigate more topics using GitHub Copilot.
11+
GitHub Copilot can help you explore additional performance and optimization ideas:
1212

13-
- Direct GitHub Copilot to try different compiler flags and use Agent mode to iterate through the options to find the best solution.
14-
- Add support for the Clang compiler to the Makefile and compare the results to GCC. Depending on the application code, changing the compiler can result in improved performance.
15-
- Use GitHub Copilot to generate different data sizes and random data patterns to further investigate correct functionality and performance.
16-
- Try different algorithm implementations that use compiler autovectorization instead of NEON intrinsics or break down the Adler32 checksum into smaller blocks of data. It may be possible to get even better performance without NEON using the compiler and a better structure for the C code.
13+
- Test different compiler flags using Agent mode to automate iteration and identify the best combinations.
14+
- Add Clang support to your Makefile and compare performance against GCC — performance can differ significantly depending on your code structure.
15+
- Generate a wider range of data sizes and random patterns to stress-test functionality and measure performance under varied conditions.
16+
- Explore alternative algorithm structures that rely on compiler autovectorization instead of NEON intrinsics — you might discover better performance simply by restructuring the C code.
1717

18-
While AI tools do not create performance code for every programming problem, they can be a big help to get you started in a new area of programming such as performance optimization using NEON intrinsics.
18+
AI tools won’t always generate high-performance code out of the box, but they can rapidly accelerate your experimentation and learning — especially in new areas of programming like NEON-based performance optimization.

content/learning-paths/cross-platform/adler32/neon-7.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ weight: 7
66
layout: learningpathall
77
---
88

9-
## How can I create a NEON version of Adler32 to improve performance?
9+
## How Can I Create a NEON Version of Adler32 To Improve Performance?
1010

1111
Writing NEON intrinsics is an advanced programming topic which requires time and effort to learn and apply. GitHub Copilot is a good way to get started with NEON, both to write new code and to debug issues during development.
1212

13-
You can instruct Github Copilot to create a NEON version of Adler32 using the prompt below.
13+
You can instruct Github Copilot to create a NEON-optimized version of Adler32 using the prompt below:
1414

1515
```console
1616
Add a second implementation of adler32 that processes data in blocks and uses Arm NEON intrinsics to improve performance.
@@ -553,6 +553,6 @@ Now that I've created the NEON implementation and updated the test code and Make
553553
554554
```
555555

556-
GitHub Copilot generated a new implementation of Adler32 using NEON intrinsics. It also updated the test program and the Makefile to make it easy to compare the two versions.
556+
GitHub Copilot generated a NEON-optimized implementation of Adler32, along with updates to the test program and Makefile. These changes make it easy to compare the performance of both versions.
557557

558-
Continue to the next section to compile and run the NEON version and investigate performance.
558+
Continue to the next section to compile and run the NEON version, and to compare its performance against the original implementation.

content/learning-paths/cross-platform/adler32/neon-debug-9.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ weight: 9
66
layout: learningpathall
77
---
88

9-
## How can I debug the NEON version Adler32 and compare performance?
9+
## How Can I Debug the NEON Version Adler32 and Compare Performance?
1010

11-
During the last step, GitHub Copilot revealed that the functional results with NEON do not match the C version.
11+
In the previous step, GitHub Copilot revealed that the NEON implementation did not produce the same checksum results as the standard C version.
1212

1313
Use the prompt below to instruct GitHub Copilot to try to debug and fix the issue.
1414

@@ -17,7 +17,7 @@ Can you modify adler32-neon.c to produce the correct checksum results for the ad
1717
The NEON version should produce the same checksum as adler32-simple.c but run faster.
1818
```
1919

20-
If the LLM you are using doesn't resolve the NEON functional issues, you can try with another LLM such as Gemini 2.5 Pro.
20+
If the LLM you’re using cannot resolve the NEON functional issues, consider trying another model, such as Gemini 2.5 Pro.
2121

2222
The output is:
2323

@@ -94,6 +94,6 @@ I then checked for errors in the modified file and rebuilt the project using `ma
9494
You can now run the test program adler32-test to verify that the NEON implementation produces the same results as the simple implementation.
9595
```
9696

97-
GitHub Copilot was able to modify the NEON version to get the correct functional results.
97+
GitHub Copilot successfully modified the NEON version to produce correct checksum results matching the standard C implementation.
9898

99-
Continue to the next section to summarize the project, including the performance improvements, and generate a full project description.
99+
Continue to the next section to summarize the project, highlight the performance improvements, and generate a full project description.

content/learning-paths/cross-platform/adler32/neon-run-8.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ weight: 8
66
layout: learningpathall
77
---
88

9-
## How can I run the NEON version of Adler32 and compare performance?
9+
## How Can I Run the NEON Version of Adler32 and Compare Performance?
1010

11-
The first step to investigate if NEON can provide improved performance is to make sure the NEON version builds, runs, and matches the results of the C version.
11+
Before comparing performance, you need to ensure that the NEON implementation builds, runs, and produces the same results as the standard C version.
1212

1313
To build the NEON version and compare it to the C version use the prompt below in your GitHub Copilot Agent session:
1414

@@ -123,6 +123,6 @@ The NEON implementation offers significant performance improvements (up to 5x fa
123123
If you'd like, I can analyze the code to identify and fix the issue in the NEON implementation so you can maintain both correctness and the performance benefits.
124124
```
125125

126-
GitHub Copilot has come to the conclusion that although the NEON version looks like it offers better performance, it does not produce the correct results for the checksums. Obviously, this is not a good result which needs to be fixed.
126+
GitHub Copilot determined that while the NEON version delivers better performance, it currently produces incorrect checksums. This must be addressed before the NEON implementation can be considered correct or production-ready.
127127

128-
Continue to the next section to fix the functional issues and continue with the performance comparison.
128+
In the next section, you'll fix the functional issues in the NEON implementation and re-run the performance comparison.

content/learning-paths/cross-platform/adler32/simple-c-3.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: Create a C version of Adler32
2+
title: Create a C Version of Adler32
33
weight: 3
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## How can I generate a C function for Adler32?
9+
## How Can I Generate a C Function for Adler32?
1010

1111
You can use GitHub Copilot to generate a simple C version of the Adler32 checksum algorithm.
1212

@@ -191,6 +191,6 @@ The Adler-32 algorithm works by calculating two 16-bit sums (A and B) and combin
191191
As requested, I did not include a main function, so this code can be used as a library for other C files.
192192
```
193193

194-
You now have the files `adler32-simple.c` and `adler32-simple.h` in your directory which compute the Adler32 checksum. These will serve as the reference implementations of the algorithm with baseline performance.
194+
You now have the files `adler32-simple.c` and `adler32-simple.h` in your directory. These serve as the reference implementation of the Adler32 algorithm with baseline performance.
195195

196196
Continue to the next section to create the test application.

content/learning-paths/cross-platform/adler32/summary-10.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,13 @@ weight: 10
66
layout: learningpathall
77
---
88

9-
## How can I summarize the project results?
9+
## How Can I Summarize the Project Results?
1010

1111
You can use GitHub Copilot to generate a project summary in a README file.
1212

13-
Copy the prompt below to your GitHub Copilot Agent chat and review the created README file.
13+
Use the prompt below to collaborate with GitHub Copilot Agent to generate your README.
14+
15+
Review and refine the results to align them with your project's goals.
1416

1517
```console
1618
Review the files in my project.
@@ -20,7 +22,7 @@ Add a note that the performance results recorded on the Neoverse N1 processor.
2022
Use a table to compare the original version and the NEON version and show the performance improvement factor.
2123
```
2224

23-
Below is the created README.md file. The formatting doesn't match the Learning Path template exactly, but you can copy the the README file to a new repository in GitHub for improved results.
25+
Below is the created README.md file. The formatting doesn't match the Learning Path template exactly, but you can copy the README file to a new repository in GitHub for improved results.
2426

2527
## Adler-32 Checksum Implementation Comparison
2628

@@ -93,6 +95,15 @@ The table summarizes the speedup obtained by the NEON version.
9395

9496
Using Agent mode in GitHub Copilot is a significant benefit when you are actively building and running software. Agent mode can create files and modify them to make needed improvements.
9597

96-
The entire project was done without modifying any of the generated files. While you may not need to do this on a real project, the concept of writing NEON intrinsics to improve performance was demonstrated. You can also use GitHub Copilot to fix issues in NEON code that are difficult to debug for developers who are not experts.
98+
### Tips for Using GitHub Copilot Effectively
99+
100+
This project was completed using GitHub Copilot Agent without modifying the generated files. While that might not be practical in every case, the demonstration shows how NEON intrinsics can significantly boost performance.
101+
102+
GitHub Copilot is especially useful for:
103+
* Generating vectorized versions of scalar code.
104+
* Writing and adapting NEON intrinsics.
105+
* Identifying and fixing bugs in complex low-level code, even for developers who aren’t SIMD experts.
97106

98107
Make sure to try different LLMs with Copilot as the results will vary greatly depending on the model.
108+
109+

0 commit comments

Comments
 (0)