Skip to content

Commit f9661b7

Browse files
Merge pull request #1474 from odincodeshen/main
WoA arm performance libraries to review
2 parents a12a875 + 95b73ae commit f9661b7

21 files changed

+449
-0
lines changed
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
title: Create your first Windows application on Microsoft Visual Studio
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Install Microsoft Visual Studio
10+
11+
Visual Studio 2022, Microsoft's Integrated Development Environment (IDE), empowers developers to build high-performance applications across a wide range of platforms. With its robust support for Arm architecture, Visual Studio 2022 opens up exciting possibilities for creating native Arm applications and optimizing existing code for Arm-based devices.
12+
13+
You can check out this Microsoft [page](https://learn.microsoft.com/en-us/visualstudio/install/visual-studio-on-arm-devices?view=vs-2022) to learn more about Visual Studio on Arm-powered devices.
14+
15+
Visual Studio 2022 offers different editions tailored to various development needs:
16+
- Community: A free, fully-featured edition ideal for students, open-source contributors, and individual developers.
17+
- Professional: Offers professional developer tools, services, and subscription benefits for small teams.
18+
- Enterprise: Provides the most comprehensive set of tools and services for large teams and enterprise-level development.
19+
20+
To select the edition best suited to your requirements, compare the features of each on the Visual Studio website: https://visualstudio.microsoft.com/vs/compare/
21+
22+
{{% notice Note %}}
23+
This Learning Path documents an example using Visual Studio 2022 Community edition, you can also use advance edition as well.
24+
{{% /notice %}}
25+
26+
Visit Viscual Studio [Downloads](https://visualstudio.microsoft.com/downloads/) page and click the download the installer executable.
27+
28+
Installation typically takes a few minutes, depending on your network speed. Double-click the downloaded installer and use the default configuration to complete the installation.
29+
30+
Learn how to install C/C++ and LLVM support in this [learning path] (https://learn.arm.com/install-guides/vs-woa/).
31+
32+
## Create a Sample Project
33+
34+
Now, you are ready to create a sample Windows application.
35+
36+
To keep the example clear and concise, we will use the simplest console app here.
37+
38+
On the start window, click "Create a new project."
39+
![img1](./figures/vs_new_proj1.png)
40+
41+
In the "Create a new project" window, select "Console App" give a project name and then click "Next".
42+
![img2](./figures/vs_new_proj2.png)
43+
44+
45+
After the project is created, you will see a line of "Hello, world" code in Program.cs.
46+
```TypeScript
47+
Console.WriteLine("Hello, World!");
48+
```
49+
50+
Microsoft Visual Studio automatically adds the build environment variable for the current hardware's CPU architecture. However, we can still familiarize ourselves with the relevant settings.
51+
52+
## Arm64 Configuration Setting
53+
54+
Go the "Debug" -> "Configuration Manager"
55+
![img4](./figures/vs_console_config1.png)
56+
57+
58+
In the Project contexts platform dropdown, click <New...>. In the New platform dialog box: select `ARM64`.
59+
![img5](./figures/vs_console_config2.png)
60+
61+
62+
{{% notice Note %}}
63+
Please reference this [learning path](https://learn.arm.com/learning-paths/laptops-and-desktops/win_wpf/how-to-2/) to find more detail about how to configure the Visual Studio platform setting.
64+
{{% /notice %}}
65+
66+
67+
Click "Build" -> "Build Solution", your first Windows will compile succesfully.
68+
69+
70+
## Run Your First Windows Application
71+
72+
Use the green arrow to run the program you just compiled, and you'll see the statement from your code correctly executed in the console.
73+
74+
![img6](./figures/vs_console_exe.png)
75+
76+
You can also use the tools provided by Visual Studio to check the compiled executable.
77+
78+
[dumpbin](https://learn.microsoft.com/en-us/cpp/build/reference/dumpbin-reference?view=msvc-170) is a command-line tool included with Microsoft Visual Studio. It's used to analyze binary files like executable files (.exe), object files (.obj), and dynamic-link libraries (.dll).
79+
In your Windows search, look for "Arm64 Native Tools Command Prompt for VS 2022" and open this program.
80+
81+
```cmd
82+
dumpbin /headers <your exe path>\ConsoelApp1.exe
83+
```
84+
85+
You can see that the file format is for an AA64 machine.
86+
![img7](./figures/vs_checkmachine.jpeg)
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
title: Build a simple math application and profiling the performance
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Clone Example from GitHub
10+
11+
We use a Windows application that renders a rotating 3D cube to perform the calculations on different programming options.
12+
13+
First, clone this Windows application repository from GitHub:
14+
15+
```cmd
16+
git clone https://github.com/odincodeshen/SpinTheCubeInGDI.git
17+
```
18+
19+
{{% notice Note %}}
20+
To facilitate explaining the topic, this repository is forked from the original author [here](https://github.com/marcpems/SpinTheCubeInGDI) with some modifications to aid in the following explanations.
21+
{{% /notice %}}
22+
23+
## Quick Introduction
24+
25+
Click the SpinTheCubeInGDI.sln to open the project.
26+
This source code implements a Windows application that renders a spinning 3D cube.
27+
28+
Four of key components are:
29+
- Shape Generation: Generates the vertices for a sphere using a golden ratio-based algorithm.
30+
- Rotation Calculation:
31+
The application uses a rotation matrix to rotate the 3D shape around the X, Y, and Z axes. The rotation angle is incremented over time, creating the animation. This code apply two options to calculate:
32+
- Multithreading: The application utilizes multithreading to improve performance by distributing the rotation calculations across multiple threads.
33+
- Arm Performance Libraries: Used for optimized calculations. (Explained in the next session)
34+
- Drawing: The application draws the transformed vertices of the shapes on the screen, using Windows API.
35+
- Performance Measurement: The code measures and displays the number of transforms per second.
36+
37+
38+
## Calculation Option#1 -- Multithreading
39+
40+
In this learning path, our focus is on the impact of different Calculation option on performance.
41+
The multithreading implement on the project involved two of functions:
42+
- CalcThreadProc():
43+
44+
This function is the entry point for each calculation thread. Each calculation thread waits on its semaphore in semaphoreList.
45+
46+
When a thread receives a signal, it calls `applyRotation()` to transform its assigned vertices. The updated vertices are stored in the drawSphereVertecies vector
47+
48+
```c++
49+
DWORD WINAPI CalcThreadProc(LPVOID data)
50+
{
51+
// need to know where to start and where to end
52+
int threadNum = LOWORD(data);
53+
int threadCount = HIWORD(data);
54+
int pointStride = spherePoints / threadCount;
55+
56+
while (!closeThreads)
57+
{
58+
// wait on a semaphore
59+
WaitForSingleObject(semaphoreList[threadNum], INFINITE);
60+
61+
EnterCriticalSection(&cubeDraw[threadNum]);
62+
// run the calculations for the set of points - need to be global
63+
applyRotation(UseCube ? cubeVertices : sphereVertices, rotationInX, threadNum * pointStride, pointStride);
64+
LeaveCriticalSection(&cubeDraw[threadNum]);
65+
66+
// set a semaphore to say we are done
67+
ReleaseSemaphore(doneList[threadNum], 1, NULL);
68+
}
69+
70+
return 0;
71+
}
72+
```
73+
74+
- applyRotation():
75+
This function applies the rotation matrix to a subset of the shape's vertices.
76+
77+
```c++
78+
void applyRotation(std::vector<double>& shape, const std::vector<double>& rotMatrix, int startPoint, int stride)
79+
{
80+
double refx, refy, refz;
81+
82+
// Start looking at the reference verticies
83+
auto point = shape.begin();
84+
point += startPoint * 3;
85+
86+
// Start the output transformed verticies
87+
auto outpoint = drawSphereVertecies.begin();
88+
outpoint += startPoint * 3;
89+
90+
int counter = 0;
91+
while (point != shape.end() && counter < stride)
92+
{
93+
counter++;
94+
95+
// take the next three values for a 3d point
96+
refx = *point; point++;
97+
refy = *point; point++;
98+
refz = *point; point++;
99+
100+
*outpoint = scale * rotMatrix[0] * refx +
101+
scale * rotMatrix[3] * refy +
102+
scale * rotMatrix[6] * refz; outpoint++;
103+
104+
*outpoint = scale * rotMatrix[1] * refx +
105+
scale * rotMatrix[4] * refy +
106+
scale * rotMatrix[7] * refz; outpoint++;
107+
108+
*outpoint = scale * rotMatrix[2] * refx +
109+
scale * rotMatrix[5] * refy +
110+
scale * rotMatrix[8] * refz; outpoint++;
111+
}
112+
}
113+
```
114+
115+
116+
## Build and Test
117+
118+
After gaining a general understanding of the project, you can compile it.
119+
Build the project, and once successful, run `SpinTheCubeInGDI.exe`.
120+
121+
You'll see a simulated 3D sphere continuously rotating. The number in the upper-left corner represents the number of frames per second (FPS). A higher number indicates better performance, and vice versa.
122+
123+
![gif1](./figures/multithreading.gif)
124+
125+
On my test machine, the performance generally falls between 3 and 6 FPS, which is unstable.
126+
127+
{{% notice Note %}}
128+
Performance may vary depending on the hardware and the system load at the time of testing.
129+
{{% /notice %}}
130+
131+
132+
You can also use the [profiling tools](https://learn.microsoft.com/en-us/visualstudio/profiling/profiling-feature-tour?view=vs-2022) to observe the dynamic CPU and memory usage while the program is running.
133+
![img8](./figures/mt_cpumem_usage1.png)
134+
135+
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
title: Using Arm Performance Libraries to accalerate your WoA application
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Introduce Arm Performance Libraries
10+
In the previous session, we gained some understanding of the performance of the first calculation option.
11+
Now, we will try Arm Performance Libraries and explore the differences in performance.
12+
13+
[Arm Performance Libraries](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries) provides optimized standard core math libraries for numerical applications on 64-bit Arm-based processors. The libraries are built with OpenMP across many BLAS, LAPACK, FFT, and sparse routines in order to maximize your performance in multi-processor environments.
14+
15+
Follow this [learning path](https://learn.arm.com/install-guides/armpl/) to install Arm Performance Libraries on Windows 11.
16+
You can also reference this [document](https://developer.arm.com/documentation/109361/latest/) about Arm Performance Libraries on Windows.
17+
18+
After successful installation, you'll find five directories in the installation folder. The `include` and `lib` are the directories contain include header files and library files, respectively. Please take note of these two directories, as we'll need them for Visual Studio setup later.
19+
20+
21+
![img9](./figures/apl_directory.png)
22+
23+
## Include Arm Performance Libraries into Visual Studio
24+
25+
To utilize the provided performance optimizations on Arm Performance Libraries, you need to manually add the paths into Visual Studio.
26+
27+
You need to configure two places in your Visual Studio projects:
28+
- #### External Include Directories:
29+
30+
1. In the Solution Explorer, right-click on your project and select "Properties".
31+
2. In the left pane of the Property Pages, expand "Configuration Properties". Select "VC++ Directories"
32+
3. In the right pane, find the "Additional Include Directories" setting.
33+
4. Click on the dropdown menu. Select "<Edit...>"
34+
5. In the dialog that opens, click the "New Line" icon to add Arm Performance Libraries `include` path.
35+
![img10](./figures/ext_include.png)
36+
37+
- #### Additional Library Directories:
38+
39+
1. In the Solution Explorer, right-click on your project and select "Properties".
40+
2. In the left pane of the Property Pages, expand "Configuration Properties". Select "Linker"
41+
3. In the right pane, find the "Additional Library Directories" setting.
42+
4. Click on the dropdown menu. Select "<Edit...>"
43+
5. In the dialog that opens, click the "New Line" icon to add Arm Performance Libraries `library` path.
44+
![img10](./figures/linker_lib.png)
45+
46+
47+
{{% notice Note %}}
48+
49+
Visual Studio allows users to set the above two paths for each individual configuration. To apply the settings to all configurations in your project, select "All Configurations" in the "Configuration" dropdown menu.
50+
{{% /notice %}}
51+
52+
53+
54+
## Calculation Option#2 -- Arm Performance Libraries
55+
56+
You are now ready to use Arm Performance Libraries in your project.
57+
Open the souece code file `SpinTheCubeInGDI.cpp` and search for the `_USE_ARMPL_DEFINES` definition.
58+
Removing the comment will enable the Arm Performance Libraries feature.
59+
60+
When variable useAPL is True, the application will call `applyRotationBLAS()` instead of multithreading code to apply the rotation matrix to the 3D vertices.
61+
62+
```c++
63+
void RotateCube(int numCores)
64+
{
65+
rotationAngle += 0.00001;
66+
if (rotationAngle > 2 * M_PI)
67+
{
68+
rotationAngle -= 2 * M_PI;
69+
}
70+
71+
// rotate around Z and Y
72+
rotationInX[0] = cos(rotationAngle) * cos(rotationAngle);
73+
rotationInX[1] = -sin(rotationAngle);
74+
rotationInX[2] = cos(rotationAngle) * sin(rotationAngle);
75+
rotationInX[3] = sin(rotationAngle) * cos(rotationAngle);
76+
rotationInX[4] = cos(rotationAngle);
77+
rotationInX[5] = sin(rotationAngle) * sin(rotationAngle);
78+
rotationInX[6] = -sin(rotationAngle);
79+
rotationInX[7] = 0;
80+
rotationInX[8] = cos(rotationAngle);
81+
82+
if (useAPL)
83+
{
84+
applyRotationBLAS(UseCube ? cubeVertices : sphereVertices, rotationInX);
85+
}
86+
else
87+
{
88+
for (int x = 0; x < numCores; x++)
89+
{
90+
ReleaseSemaphore(semaphoreList[x], 1, NULL);
91+
}
92+
WaitForMultipleObjects(numCores, doneList.data(), TRUE, INFINITE);
93+
}
94+
95+
Calculations++;
96+
}
97+
```
98+
99+
`applyRotationBLAS()` adopts BLAS matrix multiplier instead of multithreading for calculate implementation.
100+
101+
Basic Linear Algebra Subprograms (BLAS) are a set of well defined basic linear algebra operations in Arm Performance Libraries, check [cblas_dgemm](https://developer.arm.com/documentation/101004/2410/BLAS-Basic-Linear-Algebra-Subprograms/CBLAS-functions/cblas-dgemm?lang=en) to learn more about the function.
102+
103+
```c++
104+
void applyRotationBLAS(std::vector<double>& shape, const std::vector<double>& rotMatrix)
105+
{
106+
EnterCriticalSection(&cubeDraw[0]);
107+
#if defined(_M_ARM64) && defined(_USE_ARMPL_DEFINES)
108+
// Call the BLAS matrix mult for doubles.
109+
// Multiplies each of the 3d points in shape
110+
// list with rotation matrix, and applies scale
111+
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, (int)shape.size() / 3, 3, 3, scale, shape.data(), 3, rotMatrix.data(), 3, 0.0, drawSphereVertecies.data(), 3);
112+
#endif
113+
LeaveCriticalSection(&cubeDraw[0]);
114+
}
115+
```
116+
117+
## Build and Test
118+
119+
Rebuild the code and run `SpinTheCubeInGDI.exe` again, You'll see the Frame Rate has increased.
120+
On my machine, the performance stably remains between 11 and 12.
121+
122+
![gif2](./figures/apl_enable.gif)
123+
124+
Re-running profiling tools, you can see that the CPU usage has decreased significantly. There is no difference in memory usage.
125+
![img11](./figures/apl_on_cpu_mem_usage.png)
126+
127+
## Conclusion:
128+
129+
This example demonstrates that Arm Performance Libraries on Windows can improve performance for specific workloads.
130+
131+
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
title: Optimize Windows application using Arm Performance Libraries
3+
4+
minutes_to_complete: 60
5+
6+
who_is_this_for: This is an introductory topic for software developers who want to learn how to optimize Windows-on-Arm (WoA) application performance on Arm64 devices.
7+
8+
learning_objectives:
9+
- Develop Windows on Arm (WoA) applications using Microsoft Visual Studio.
10+
- Utilize Arm Performance Libraries for performance optimization.
11+
12+
prerequisites:
13+
- A Windows on Arm computer such as [Windows Dev Kit 2023](https://learn.microsoft.com/en-us/windows/arm/dev-kit) or Lenovo Thinkpad X13s running Windows 11.
14+
15+
author_primary: Odin Shen
16+
17+
### Tags
18+
skilllevels: Introductory
19+
subjects: Migration to Arm
20+
armips:
21+
- Cortex-A
22+
tools_software_languages:
23+
- Visual Studio
24+
- C#
25+
- .NET
26+
operatingsystems:
27+
- Windows
28+
29+
30+
### FIXED, DO NOT MODIFY
31+
# ================================================================================
32+
weight: 1 # _index.md always has weight of 1 to order correctly
33+
layout: "learningpathall" # All files under learning paths have this same wrapper
34+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
35+
---

0 commit comments

Comments
 (0)