|
| 1 | +--- |
| 2 | +title: Using Arm Performance Libraries to accalerate your WoA application |
| 3 | +weight: 4 |
| 4 | + |
| 5 | +### FIXED, DO NOT MODIFY |
| 6 | +layout: learningpathall |
| 7 | +--- |
| 8 | + |
| 9 | +## Introduce Arm Performance Libraries |
| 10 | +In the previous session, we gained some understanding of the performance of the first calculation option. |
| 11 | +Now, we will try Arm Performance Libraries and explore the differences in performance. |
| 12 | + |
| 13 | +[Arm Performance Libraries](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries) provides optimized standard core math libraries for numerical applications on 64-bit Arm-based processors. The libraries are built with OpenMP across many BLAS, LAPACK, FFT, and sparse routines in order to maximize your performance in multi-processor environments. |
| 14 | + |
| 15 | +Follow this [learning path](https://learn.arm.com/install-guides/armpl/) to install Arm Performance Libraries on Windows 11. |
| 16 | +You can also reference this [document](https://developer.arm.com/documentation/109361/latest/) about Arm Performance Libraries on Windows. |
| 17 | + |
| 18 | +After successful installation, you'll find five directories in the installation folder. The `include` and `lib` are the directories contain include header files and library files, respectively. Please take note of these two directories, as we'll need them for Visual Studio setup later. |
| 19 | + |
| 20 | + |
| 21 | +  |
| 22 | + |
| 23 | +## Include Arm Performance Libraries into Visual Studio |
| 24 | + |
| 25 | +To utilize the provided performance optimizations on Arm Performance Libraries, you need to manually add the paths into Visual Studio. |
| 26 | + |
| 27 | +You need to configure two places in your Visual Studio projects: |
| 28 | + - #### External Include Directories: |
| 29 | + |
| 30 | + 1. In the Solution Explorer, right-click on your project and select "Properties". |
| 31 | + 2. In the left pane of the Property Pages, expand "Configuration Properties". Select "VC++ Directories" |
| 32 | + 3. In the right pane, find the "Additional Include Directories" setting. |
| 33 | + 4. Click on the dropdown menu. Select "<Edit...>" |
| 34 | + 5. In the dialog that opens, click the "New Line" icon to add Arm Performance Libraries `include` path. |
| 35 | +  |
| 36 | + |
| 37 | + - #### Additional Library Directories: |
| 38 | + |
| 39 | + 1. In the Solution Explorer, right-click on your project and select "Properties". |
| 40 | + 2. In the left pane of the Property Pages, expand "Configuration Properties". Select "Linker" |
| 41 | + 3. In the right pane, find the "Additional Library Directories" setting. |
| 42 | + 4. Click on the dropdown menu. Select "<Edit...>" |
| 43 | + 5. In the dialog that opens, click the "New Line" icon to add Arm Performance Libraries `library` path. |
| 44 | +  |
| 45 | + |
| 46 | + |
| 47 | +{{% notice Note %}} |
| 48 | + |
| 49 | +Visual Studio allows users to set the above two paths for each individual configuration. To apply the settings to all configurations in your project, select "All Configurations" in the "Configuration" dropdown menu. |
| 50 | +{{% /notice %}} |
| 51 | + |
| 52 | + |
| 53 | + |
| 54 | + ## Calculation Option#2 -- Arm Performance Libraries |
| 55 | + |
| 56 | +You are now ready to use Arm Performance Libraries in your project. |
| 57 | +Open the souece code file `SpinTheCubeInGDI.cpp` and search for the `_USE_ARMPL_DEFINES` definition. |
| 58 | +Removing the comment will enable the Arm Performance Libraries feature. |
| 59 | + |
| 60 | +When variable useAPL is True, the application will call `applyRotationBLAS()` instead of multithreading code to apply the rotation matrix to the 3D vertices. |
| 61 | + |
| 62 | +```c++ |
| 63 | +void RotateCube(int numCores) |
| 64 | +{ |
| 65 | + rotationAngle += 0.00001; |
| 66 | + if (rotationAngle > 2 * M_PI) |
| 67 | + { |
| 68 | + rotationAngle -= 2 * M_PI; |
| 69 | + } |
| 70 | + |
| 71 | + // rotate around Z and Y |
| 72 | + rotationInX[0] = cos(rotationAngle) * cos(rotationAngle); |
| 73 | + rotationInX[1] = -sin(rotationAngle); |
| 74 | + rotationInX[2] = cos(rotationAngle) * sin(rotationAngle); |
| 75 | + rotationInX[3] = sin(rotationAngle) * cos(rotationAngle); |
| 76 | + rotationInX[4] = cos(rotationAngle); |
| 77 | + rotationInX[5] = sin(rotationAngle) * sin(rotationAngle); |
| 78 | + rotationInX[6] = -sin(rotationAngle); |
| 79 | + rotationInX[7] = 0; |
| 80 | + rotationInX[8] = cos(rotationAngle); |
| 81 | + |
| 82 | + if (useAPL) |
| 83 | + { |
| 84 | + applyRotationBLAS(UseCube ? cubeVertices : sphereVertices, rotationInX); |
| 85 | + } |
| 86 | + else |
| 87 | + { |
| 88 | + for (int x = 0; x < numCores; x++) |
| 89 | + { |
| 90 | + ReleaseSemaphore(semaphoreList[x], 1, NULL); |
| 91 | + } |
| 92 | + WaitForMultipleObjects(numCores, doneList.data(), TRUE, INFINITE); |
| 93 | + } |
| 94 | + |
| 95 | + Calculations++; |
| 96 | +} |
| 97 | +``` |
| 98 | +
|
| 99 | +`applyRotationBLAS()` adopts BLAS matrix multiplier instead of multithreading for calculate implementation. |
| 100 | +
|
| 101 | +Basic Linear Algebra Subprograms (BLAS) are a set of well defined basic linear algebra operations in Arm Performance Libraries, check [cblas_dgemm](https://developer.arm.com/documentation/101004/2410/BLAS-Basic-Linear-Algebra-Subprograms/CBLAS-functions/cblas-dgemm?lang=en) to learn more about the function. |
| 102 | +
|
| 103 | +```c++ |
| 104 | +void applyRotationBLAS(std::vector<double>& shape, const std::vector<double>& rotMatrix) |
| 105 | +{ |
| 106 | + EnterCriticalSection(&cubeDraw[0]); |
| 107 | +#if defined(_M_ARM64) && defined(_USE_ARMPL_DEFINES) |
| 108 | + // Call the BLAS matrix mult for doubles. |
| 109 | + // Multiplies each of the 3d points in shape |
| 110 | + // list with rotation matrix, and applies scale |
| 111 | + cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, (int)shape.size() / 3, 3, 3, scale, shape.data(), 3, rotMatrix.data(), 3, 0.0, drawSphereVertecies.data(), 3); |
| 112 | +#endif |
| 113 | + LeaveCriticalSection(&cubeDraw[0]); |
| 114 | +} |
| 115 | +``` |
| 116 | + |
| 117 | +## Build and Test |
| 118 | + |
| 119 | +Rebuild the code and run `SpinTheCubeInGDI.exe` again, You'll see the Frame Rate has increased. |
| 120 | +On my machine, the performance stably remains between 11 and 12. |
| 121 | + |
| 122 | + |
| 123 | + |
| 124 | +Re-running profiling tools, you can see that the CPU usage has decreased significantly. There is no difference in memory usage. |
| 125 | +  |
| 126 | + |
| 127 | +## Conclusion: |
| 128 | + |
| 129 | +This example demonstrates that Arm Performance Libraries on Windows can improve performance for specific workloads. |
| 130 | + |
| 131 | + |
0 commit comments