diff --git a/.wordlist.txt b/.wordlist.txt index 7010bf19d2..c5ade02d65 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -3266,4 +3266,74 @@ kubearchinspect mlops multithreading preloaded -requantize \ No newline at end of file +requantize +ADK +AbstRaction +Alaaeddine +Albin +AliCloud +Astra +Bernhardsson +BitBake +Chakroun +ColorAttachment +CommandEncoder +CustomDataSources +DawnWebGPU +ETL +ETW +GameActivity +Georgios +ISAs +Jetpack +KV +Koki +LearnWebGPU +Mermigkis +Mitsunami +Naga +NativeActivity +OpenGLES +Perfetto +Qwen +RSE +RTP +RenderPassEncoder +RenderPipeline +SECurity +SPIR +SType +StackOverflow +SurfaceView +TextureView +Thelio +Tianyu +TinyOBJLoader +Tmux +WGSL +WPR +WebGL +WebGPU +WebGPU’s +Xperf +andc +andnot +dylib +eliemichel +epi +intrinsic's +pApp +rtp +samdauwe +shaderCodeDesc +transpilation +usecase +varunchariArm +vbic +vbicq +webgpu +webgpufundamentals +wgpuQueueSubmit +wgpuQueueWriteBuffer +wgpuQueueWriteTexture +wpa \ No newline at end of file diff --git a/assets/contributors.csv b/assets/contributors.csv index d599b57494..d50bc49455 100644 --- a/assets/contributors.csv +++ b/assets/contributors.csv @@ -45,3 +45,5 @@ Nader Zouaoui,Day Devs,nader-zouaoui,nader-zouaoui,@zouaoui_nader,https://daydev Alaaeddine Chakroun,Day Devs,Alaaeddine-Chakroun,alaaeddine-chakroun,,https://daydevs.com/ Koki Mitsunami,Arm,,,, Chen Zhang,Zilliz,,,, +Tianyu Li,Arm,,,, +Georgios Mermigkis,VectorCamp,gMerm,georgios-mermigkis,,https://vectorcamp.gr/ diff --git a/content/install-guides/acfl.md b/content/install-guides/acfl.md index e7e6bd2655..383cde57a9 100644 --- a/content/install-guides/acfl.md +++ b/content/install-guides/acfl.md @@ -102,11 +102,11 @@ Fetch the ACfL installers: #### Ubuntu Linux: ```bash { target="ubuntu:latest" } -wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Compiler-for-Linux/Version_24.10/arm-compiler-for-linux_24.10_Ubuntu-22.04_aarch64.tar +wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Compiler-for-Linux/Version_24.10.1/arm-compiler-for-linux_24.10.1_Ubuntu-22.04_aarch64.tar ``` #### Red Hat Linux: ```bash { target="fedora:latest" } -wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Compiler-for-Linux/Version_24.10/arm-compiler-for-linux_24.10_RHEL-9_aarch64.tar +wget https://developer.arm.com/-/cdn-downloads/permalink/Arm-Compiler-for-Linux/Version_24.10.1/arm-compiler-for-linux_24.10.1_RHEL-9_aarch64.tar ``` ### Install @@ -119,18 +119,18 @@ Each command sequence includes accepting the license agreement to automate the i ```bash { target="ubuntu:latest", env="DEBIAN_FRONTEND=noninteractive" } sudo -E apt-get -y install environment-modules python3 libc6-dev -tar -xvf arm-compiler-for-linux_24.10_Ubuntu-22.04_aarch64.tar -cd ./arm-compiler-for-linux_24.10_Ubuntu-22.04 -sudo ./arm-compiler-for-linux_24.10_Ubuntu-22.04.sh --accept +tar -xvf arm-compiler-for-linux_24.10.1_Ubuntu-22.04_aarch64.tar +cd ./arm-compiler-for-linux_24.10.1_Ubuntu-22.04 +sudo ./arm-compiler-for-linux_24.10.1_Ubuntu-22.04.sh --accept ``` #### Red Hat Linux: ```bash { target="fedora:latest" } sudo yum -y install environment-modules python3 glibc-devel -tar -xvf arm-compiler-for-linux_24.10_RHEL-9_aarch64.tar -cd arm-compiler-for-linux_24.10_RHEL-9 -sudo ./arm-compiler-for-linux_24.10_RHEL-9.sh --accept +tar -xvf arm-compiler-for-linux_24.10.1_RHEL-9_aarch64.tar +cd arm-compiler-for-linux_24.10.1_RHEL-9 +sudo ./arm-compiler-for-linux_24.10.1_RHEL-9.sh --accept ``` {{% notice Warning %}} @@ -173,7 +173,7 @@ module avail To configure Arm Compiler for Linux: ```bash { env_source="~/.bashrc" } -module load acfl/24.10 +module load acfl/24.10.1 ``` To configure GCC: @@ -237,7 +237,7 @@ ACfL is now [ready to use](#armclang). To get started with the Arm C/C++ Compiler and compile a simple application follow the steps below. Check that the correct compiler version is being used: -```bash { env_source="~/.bashrc", pre_cmd="module load acfl/24.10" } +```bash { env_source="~/.bashrc", pre_cmd="module load acfl/24.10.1" } armclang --version ``` @@ -255,13 +255,13 @@ int main() Build the application with: -```console { env_source="~/.bashrc", pre_cmd="module load acfl/24.10" } +```console { env_source="~/.bashrc", pre_cmd="module load acfl/24.10.1" } armclang hello.c -o hello ``` Run the application with: -```bash { env_source="~/.bashrc", pre_cmd="module load acfl/24.10" } +```bash { env_source="~/.bashrc", pre_cmd="module load acfl/24.10.1" } ./hello ``` @@ -275,7 +275,7 @@ Hello, C World! To get started with the Arm Fortran Compiler and compile a simple application follow the steps below. Check that the correct compiler version is being used: -```bash { env_source="~/.bashrc", pre_cmd="module load acfl/24.10" } +```bash { env_source="~/.bashrc", pre_cmd="module load acfl/24.10.1" } armflang --version ``` @@ -289,12 +289,12 @@ end program hello ``` Build the application with: -```bash { env_source="~/.bashrc", pre_cmd="module load acfl/24.10" } +```bash { env_source="~/.bashrc", pre_cmd="module load acfl/24.10.1" } armflang hello.f90 -o hello ``` Run the application with: -```bash { env_source="~/.bashrc", pre_cmd="module load acfl/24.10" } +```bash { env_source="~/.bashrc", pre_cmd="module load acfl/24.10.1" } ./hello ``` diff --git a/content/install-guides/ambaviz.md b/content/install-guides/ambaviz.md index 242f192567..be77d85c07 100644 --- a/content/install-guides/ambaviz.md +++ b/content/install-guides/ambaviz.md @@ -37,9 +37,9 @@ A detailed overview of functionality is described in [Introduction to AMBA Viz]( AMBA Viz is a component of [Arm Hardware Success Kits](https://www.arm.com/products/development-tools/success-kits). -It is available to download at the [Arm Product Download Hub](https://developer.arm.com/downloads/view/HWSKT-KS-0002). +It is available to download at the [Arm Product Download Hub](https://developer.arm.com/downloads/). -You can download AMBA Viz as an individual standalone component, or you can download the complete Success Kits. +You can download AMBA Viz as an individual component, or you can download complete Success Kits. For more information on the Download Hub, refer to the [Arm Product Download Hub install guide](/install-guides/pdh/). diff --git a/content/install-guides/armclang.md b/content/install-guides/armclang.md index d0667b94ff..8cc621c832 100644 --- a/content/install-guides/armclang.md +++ b/content/install-guides/armclang.md @@ -44,16 +44,13 @@ Arm Compiler for Embedded FuSa must also be [downloaded separately](#download). Individual compiler packages for all supported host platforms can be downloaded from the [Arm Product Download Hub](#pdh) or the [Arm Tools Artifactory](#artifactory). -Individual compiler packages for all supported host platforms can be downloaded from the [Arm Product Download Hub](https://developer.arm.com/downloads) or the [Arm Tools Artifactory](https://www.keil.arm.com/artifacts/). - ### Product Download Hub {#pdh} -All compiler packages can be downloaded from the [Arm Product Download Hub](https://developer.arm.com/downloads) (requires login): +All compiler packages can be downloaded from the [Arm Product Download Hub](https://developer.arm.com/downloads) (requires login). -- [Arm Compiler for Embedded](https://developer.arm.com/downloads/view/ACOMPE) -- [Arm Compiler for Embedded FuSa](https://developer.arm.com/downloads/view/ACOMP616) +Download links to all available versions are given in the [Arm Compiler downloads index](https://developer.arm.com/documentation/ka005198). -These can either be used standalone or [integrated](#armds) into your Arm Development Studio installation. +All compiler versions can be used standalone or [integrated](#armds) into your Arm Development Studio installation. See also: [What should I do if I want to download a legacy release of Arm Compiler?](https://developer.arm.com/documentation/ka005184) @@ -121,7 +118,9 @@ export AC6_TOOLCHAIN_6_22_0=/home/$USER/ArmCompilerforEmbedded6.22/bin ## Set up the product license -Arm Compiler for Embedded and Arm Compiler for Embedded FuSa are license managed. License setup instructions are available in the [Arm Licensing install guide](/install-guides/license/). +`Arm Compiler for Embedded` and `Arm Compiler for Embedded FuSa` are license managed. + +License setup instructions are available in the [Arm Licensing install guide](/install-guides/license/). ## Verify installation diff --git a/content/install-guides/license/_index.md b/content/install-guides/license/_index.md index 3aeaf7ebd4..630411f673 100644 --- a/content/install-guides/license/_index.md +++ b/content/install-guides/license/_index.md @@ -16,11 +16,9 @@ multitool_install_part: false # Set to true if a sub-page of a multi-page arti layout: installtoolsall # DO NOT MODIFY. Always true for tool install articles --- -Most Arm commercial tools are license managed. Arm is migrating to user-based licensing (UBL) which greatly simplifies license configuration. It is available for [Arm Success Kits](/install-guides/successkits/) as well as [Arm Development Studio](/install-guides/armds). +Most Arm commercial tools are license managed. Arm is migrating to user-based licensing (UBL) which greatly simplifies license configuration. -Success kits are available as `Hardware Success Kits` (`HSK`) or `Software Success Kits` (`SSK`). See the table below for tooling provided. SSK is a subset of HSK. - -With UBL, you have unlimited access to all components in the success kit you have enabled. The license is cached locally for up to 7 days, enabling remote or traveling users to access tools without connecting to their license server. +A user-based license is cached locally for up to 7 days, enabling remote or traveling users to access tools without connecting to their license server. Starting any UBL enabled tool when the server is available will renew the license for 7 more days. This renewal attempt is performed once per 24 hours. @@ -52,10 +50,4 @@ Legacy product versions do not support UBL licensing and use FlexLM [floating li ## User-based Licensing Video Tutorials -In addition to the set up and install instructions below, a collection of video tutorials are available on the Arm Developer website: - -* [Accessing the Arm License Portal](https://developer.arm.com/Additional%20Resources/Video%20Tutorials/User-based%20Licensing%20-%20Accessing%20the%20Arm%20License%20Portal) -* [Cloud-based Licenses and Activation Codes](https://developer.arm.com/Additional%20Resources/Video%20Tutorials/User-based%20Licensing%20-%20Cloud-based%20Licenses%20and%20Activation%20Codes) -* [Local License Server Setup](https://developer.arm.com/Additional%20Resources/Video%20Tutorials/User-based%20Licensing%20-%20Local%20License%20Server%20Setup) -* [End-user Setup](https://developer.arm.com/Additional%20Resources/Video%20Tutorials/User-based%20Licensing%20-%20End%20User%20Setup) -* [Removal of Licenses and Decommissioning Server](https://developer.arm.com/Additional%20Resources/Video%20Tutorials/User-based%20Licensing%20-%20License%20Removal%20and%20Decommissioning%20Server) +In addition to the set up and install instructions below, a collection of video tutorials are available on [Arm Developer](https://developer.arm.com//Tools%20and%20Software/User-based%20Licensing). diff --git a/content/install-guides/license/ubl_license_enduser.md b/content/install-guides/license/ubl_license_enduser.md index 3c9e7f6a72..45baccb6fb 100644 --- a/content/install-guides/license/ubl_license_enduser.md +++ b/content/install-guides/license/ubl_license_enduser.md @@ -15,15 +15,9 @@ layout: installtoolsall # DO NOT MODIFY. Always true for tool install ar A [Local License Server (LLS)](/install-guides/license/ubl_license_admin/) must first be set up by your license administration team. -{{% notice Notice%}} -A Software Success Kit is a subset of a Hardware Success Kit. - -You should confirm which type of license is appropriate for your needs. -{{% /notice %}} - ## Activate license on end user machine -The UBL license can be activated on the end user machine in different ways. Select the most appropriate for your needs. +The user-based license can be activated on the end user machine in different ways. Select the most appropriate for your needs. * [Activate via environment variable](#envvar) * [Activate within tools IDE](#ide) @@ -31,7 +25,7 @@ The UBL license can be activated on the end user machine in different ways. Sele ## Activate via environment variable {#envvar} -Create `ARMLM_ONDEMAND_ACTIVATION` environment variable referencing the Success Kit product code and your internal UBL license server. Contact your internal license administrators for information on your internal server. +Create `ARMLM_ONDEMAND_ACTIVATION` environment variable referencing the product code and your internal license server. Contact your internal license administrators for information on your internal server. ### HSK ```console @@ -42,7 +36,7 @@ export ARMLM_ONDEMAND_ACTIVATION=HWSKT-STD0@https://internal.ubl.server export ARMLM_ONDEMAND_ACTIVATION=SWSKT-STD0@https://internal.ubl.server ``` -A license will be automatically checked out whenever a UBL enabled tool is run, for example: +A license will be automatically checked out whenever a user-based licensing enabled tool is run, for example: ```command armclang --version ``` @@ -58,17 +52,12 @@ Select `Activate with` > `License Server`, and enter the appropriate license ser ## Activate manually {#manual} -Open a command prompt, and navigate to the bin directory of any UBL enabled product. +Open a command prompt, and navigate to the bin directory of any user-based licensing enabled product. -Activate an appropriate success kit license with `armlm`: -### HSK +Activate your user-based license with `armlm`: ```console armlm activate --server https://internal.ubl.server --product HWSKT-STD0 ``` -### SSK -``` -armlm activate --server https://internal.ubl.server --product SWSKT-STD0 -``` ## Confirm license check-out {#confirm} @@ -84,7 +73,7 @@ You should see an output similar to: Hardware Success Kit Product code: HWSKT-STD0 Order Id: xxxxxxxx - License valid until: 2023-12-31 + License valid until: 2025-12-31 Local cache expires in: 6 days and 23 hours License server: https://internal.ubl.server ``` @@ -93,7 +82,7 @@ Hardware Success Kit Your license is cached on your local machine, and is valid for 7 days. -There will be an automatic attempt to refresh this timer on the first usage of a UBL enabled tool in a day. If that fails (for example, if tools are run whilst not connected to your network) the tools can still be used provided there is still time on the locally cached license. +There will be an automatic attempt to refresh this license once per day. If that fails (for example, if tools are run whilst not connected to your network) the tools can still be used provided there is still time on the locally cached license. To manually refresh the license, you can deactivate and reactivate your license (when connected to your network). For example: ```command diff --git a/content/install-guides/socrates.md b/content/install-guides/socrates.md index 710dc19974..6ddf70f2f3 100644 --- a/content/install-guides/socrates.md +++ b/content/install-guides/socrates.md @@ -35,9 +35,9 @@ layout: installtoolsall # DO NOT MODIFY. Always true for tool install ar Socrates is a component of [Arm Hardware Success Kits](https://www.arm.com/products/development-tools/success-kits). -It is available to download via the [Arm Product Download Hub](https://developer.arm.com/downloads/view/HWSKT-KS-0002). +It is available to download via the [Arm Product Download Hub](https://developer.arm.com/downloads/). -You can download Socrates as an individual standalone component, or you can download the complete success kits. +You can download Socrates as an individual component, or you can download the complete success kits. For more information on the Download Hub, refer to the [Arm Product Download Hub install guide](/install-guides/pdh/). diff --git a/content/install-guides/successkits.md b/content/install-guides/successkits.md index e2d6976f6d..fd3312e328 100644 --- a/content/install-guides/successkits.md +++ b/content/install-guides/successkits.md @@ -28,16 +28,12 @@ multi_install: false # Set to true if first page of multi-page articl multitool_install_part: false # Set to true if a sub-page of a multi-page article, else false layout: installtoolsall # DO NOT MODIFY. Always true for tool install articles --- -Arm Development tools are packaged as [Arm Success Kits](https://www.arm.com/products/development-tools/success-kits). These come in two forms: +Arm Development tools are packaged as [Arm Success Kits](https://www.arm.com/products/development-tools/success-kits). -- Software Success Kits (SSK) -- Hardware Success Kits (HSK) - -SSKs contain all of the software development tools provided by Arm. - -HSKs include SSK as well as additional tools for IP configuration and evaluation. - -Arm Success Kits are a component of [Arm Flexible Access](https://www.arm.com/en/products/flexible-access). +Arm Success Kits are a component of: +* [Arm Total Access](https://www.arm.com/products/licensing/arm-total-access) +* [Arm Flexible Access](https://www.arm.com/en/products/flexible-access) +* [Arm Academic Access](https://www.arm.com/resources/research/enablement/academic-access) ## Downloading Success Kit components diff --git a/content/install-guides/sysbox.md b/content/install-guides/sysbox.md index a135f2121b..ae2075f29f 100644 --- a/content/install-guides/sysbox.md +++ b/content/install-guides/sysbox.md @@ -63,13 +63,13 @@ Download the Sysbox official package from [Sysbox Releases](https://github.com/n You can download the Debian package for Arm from the command line: ```bash -wget https://downloads.nestybox.com/sysbox/releases/v0.6.4/sysbox-ce_0.6.4-0.linux_arm64.deb +wget https://downloads.nestybox.com/sysbox/releases/v0.6.5/sysbox-ce_0.6.5-0.linux_arm64.deb ``` Install the package using the `apt` command: ```bash -sudo apt-get install ./sysbox-ce_0.6.4-0.linux_arm64.deb -y +sudo apt-get install ./sysbox-ce_0.6.5-0.linux_arm64.deb -y ``` If you are not using a Debian-based Linux distribution, you can use instructions to build Sysbox from the source code. Refer to [Sysbox Developer's Guide: Building & Installing](https://github.com/nestybox/sysbox/blob/master/docs/developers-guide/build.md) for further information. diff --git a/content/install-guides/windows-perf-wpa-plugin.md b/content/install-guides/windows-perf-wpa-plugin.md index f23f5cc042..4e0fa1f432 100644 --- a/content/install-guides/windows-perf-wpa-plugin.md +++ b/content/install-guides/windows-perf-wpa-plugin.md @@ -1,10 +1,11 @@ --- -### Title the install tools article with the name of the tool to be installed -### Include vendor name where appropriate -title: Windows Performance Analyzer (WPA) Plugin -minutes_to_complete: 15 +title: Windows Performance Analyzer (WPA) plugin draft: true +cascade: + draft: true + +minutes_to_complete: 15 official_docs: https://github.com/arm-developer-tools/windowsperf-wpa-plugin @@ -31,37 +32,40 @@ layout: installtoolsall # DO NOT MODIFY. Always true for tool install articles ## What is the Windows Performance Analyzer plugin? -The Windows Performance Analyzer plugin connects Windows Perf to the Windows Performance Analyzer (WPA). - -[WindowsPerf](https://github.com/arm-developer-tools/windowsperf) is a lightweight performance profiling tool inspired by Linux Perf and designed for Windows on Arm. +The Windows Performance Analyzer (WPA) plugin connects [WindowsPerf](/learning-paths/laptops-and-desktops/windowsperf/) to the Windows Performance Analyzer. Windows Perf is a lightweight performance profiling tool inspired by Linux Perf and designed for Windows on Arm. -Windows Performance Analyzer (WPA) is a tool that creates graphs and data tables of Event Tracing for Windows (ETW) events that are recorded by Windows Performance Recorder (WPR), Xperf, or an assessment that is run in the Assessment Platform. WPA opens event trace log (ETL) files for analysis. +Windows Performance Analyzer is a useful tool that supports developers with diagnostics and performance tuning. It generates data tables and graphs of Event Tracing for Windows (ETW) events, which are recorded in one of three ways: +- Windows Performance Recorder (WPR) +- Xperf +- or through an assessment that's run in the Assessment Platform. + +WPA can open event trace log (ETL) files, which you can use for analysis. -The WPA plugin is built using the [Microsoft Performance Toolkit SDK](https://github.com/microsoft/microsoft-performance-toolkit-sdk), a collection of tools to create and extend performance analysis applications. The plugin parses json output from WidowsPerf so that it can be visualized in WPA. +The WPA plugin is built using the [Microsoft Performance Toolkit SDK](https://github.com/microsoft/microsoft-performance-toolkit-sdk), a collection of tools to create and extend performance analysis applications. The plugin parses JSON output from Windows Perf so that it can be visualized in WPA. ## What are some of the features of the WPA plugin? -The WindowsPerf GUI extension is composed of several key features, each designed to streamline the user experience: +The WindowsPerf GUI extension includes features, which are designed to streamline the user experience: -### What is the timeline view? +### Timeline view -The timeline view visualizes the `wperf stat` timeline data plotted by event group. +The timeline view visualizes the `wperf stat` timeline data plotted by event group: ![Timeline By Core Table](/install-guides/_images/wpa-timeline-by-core.png) -### What is the telemetry view? +### Telemetry view -The telemetry view displays telemetry events grouped by unit. +The telemetry view displays telemetry events grouped by unit: ![Telemetry Table](/install-guides/_images/wpa-telemetry-table.png) ## How do I install the WPA plugin? -Before using the WPA plugin, make sure you have installed WPA. +Before installing the plugin, you need to make sure you have installed WPA: -### Windows Performance Analyzer +### Install WPA -WPA is included in the Windows Assessment and Deployment Kit (Windows ADK) that can be downloaded from [Microsoft](https://go.microsoft.com/fwlink/?linkid=2243390). +WPA is included in the Windows Assessment and Deployment Kit (Windows ADK), which you can download from [Microsoft](https://go.microsoft.com/fwlink/?linkid=2243390). {{% notice Note %}} The WPA plugin requires WPA version `11.0.7.2` or higher. @@ -71,19 +75,19 @@ Run the downloaded `adksetup.exe` program. Specify the default installation location and accept the license agreement. -Make sure that "Windows Performance Toolkit" is checked under "Select the features you want to install". +Make sure that **Windows Performance Toolkit** is checked under **Select the features you want to install**. ![WPA Installation](/install-guides/_images/wpa-installation.png) -Finally, click Install. +Finally, click **Install**. -### Windows Performance Analyzer plugin +### Install the WPA plugin -The plugin is a single `.dll` file. +Now you're ready to install the plugin, which is a single `.dll` file. -Download a `.zip` file from the [GitHub releases page](https://github.com/arm-developer-tools/windowsperf-wpa-plugin/releases). +Download the `.zip` file from the [Windows Perf WPA plugin GitHub releases page](https://github.com/arm-developer-tools/windowsperf-wpa-plugin/releases) on GitHub. -To download the latest version from the command prompt: +Alternatively, you can download the latest version using command prompt: ```console mkdir wpa-plugin @@ -91,26 +95,26 @@ cd wpa-plugin curl -L -O https://github.com/arm-developer-tools/windowsperf-wpa-plugin/releases/download/1.0.2/wpa-plugin-1.0.2.zip ``` -Extract the `.dll` file from the downloaded `.zip` file. +Now extract the `.dll` file from the downloaded `.zip` file. ```console tar -xmf wpa-plugin-1.0.2.zip ``` -You now have the file `WPAPlugin.dll` in your `wpa-plugin` directory. +The file `WPAPlugin.dll` is now in your `wpa-plugin` directory. There are three ways you can install the `WPAPlugin.dll` file: -###### 1. Copy the plugin dll to the CustomDataSources directory next to the WPA executable. +#### 1. Copy the .dll file to the CustomDataSources directory next to the WPA executable. The default location is: `C:\\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\CustomDataSources` -###### 2. Set an environment variable +#### 2. Set an environment variable. -Set the `WPA_ADDITIONAL_SEARCH_DIRECTORIES` environment variable to the location of the DLL file. +Set the `WPA_ADDITIONAL_SEARCH_DIRECTORIES` environment variable to the location of the `.dll` file. -###### 3. Start WPA from the command line and pass the plugin directory location using a flag. +#### 3. Start WPA from the command line and pass the plugin directory location using a flag. Use the `-addsearchdir` flag for `wpa`: @@ -120,13 +124,13 @@ wpa -addsearchdir "%USERPROFILE%\plugins" ## How can I verify the WPA plugin is installed? -To verify the plugin is loaded, launch WPA and the plugin should appear under `Help > About Windows Performance Analyzer` +To verify the plugin is loaded, launch WPA and the plugin should appear under **Help > About Windows Performance Analyzer**. ![WPA installation confirmation](/install-guides/_images/about-wpa.png) ## How can I run the WPA plugin from the command line? -To open a json file directly from the command line, you can use the `-i` flag to specify the file path to open. +To open a JSON file directly from the command line, you can use the `-i` flag to specify the file path to open. For example: to open `timeline_long.json` in your downloads directory, run the command: diff --git a/content/learning-paths/cross-platform/_example-learning-path/write-2-metadata.md b/content/learning-paths/cross-platform/_example-learning-path/write-2-metadata.md index c6b4f9cd7a..4dc0bcfcaa 100644 --- a/content/learning-paths/cross-platform/_example-learning-path/write-2-metadata.md +++ b/content/learning-paths/cross-platform/_example-learning-path/write-2-metadata.md @@ -49,7 +49,7 @@ Displaying your name on the content you contributed is a great way to promote yo If you do not want your name to be displayed leave `author_primary` blank. -You can share additional information about yourself by editing the file [`contributors.csv`](https://github.com/ArmDeveloperEcosystem/arm-learning-paths/blob/main/contributors.csv) at the top of the repository. This file collects your company name, GitHub username, LinkedIn profile, Twitter handle, and your website. All fields are optional, but any you add to `contributors.csv` will appear next to your name in the `Author` field. +You can share additional information about yourself by editing the file [`contributors.csv`](https://github.com/ArmDeveloperEcosystem/arm-learning-paths/blob/main/assets/contributors.csv) at the top of the repository. This file collects your company name, GitHub username, LinkedIn profile, Twitter handle, and your website. All fields are optional, but any you add to `contributors.csv` will appear next to your name in the `Author` field. ## Tags Tagging metadata is also expected to increase visibility through filtering. Some tags are closed (you must select from a pre-defined list) and some are open (enter anything). The tags are: diff --git a/content/learning-paths/cross-platform/gitlab/1-gitlab-runner.md b/content/learning-paths/cross-platform/gitlab/1-gitlab-runner.md index d11f820821..cacd4cc62e 100644 --- a/content/learning-paths/cross-platform/gitlab/1-gitlab-runner.md +++ b/content/learning-paths/cross-platform/gitlab/1-gitlab-runner.md @@ -16,7 +16,7 @@ A GitLab Runner works with GitLab CI/CD to run jobs in a pipeline. It acts as an 3. Multi-architecture support: GitLab runners support multiple architectures including - `x86/amd64` and `arm64` ## What is Google Axion? -Axion is Google’s first Arm-based server processor, built using the Armv9 Neoverse V2 CPU. The VM instances are part of the `C4A` family of compute instances. To learn more about Google Axion refer to this [page](cloud.google.com/products/axion). +Axion is Google’s first Arm-based server processor, built using the Armv9 Neoverse V2 CPU. The VM instances are part of the `C4A` family of compute instances. To learn more about Google Axion refer to this [page](http://cloud.google.com/products/axion/). ## Install GitLab runner on a Google Axion VM diff --git a/content/learning-paths/cross-platform/simd-info-demo/_index.md b/content/learning-paths/cross-platform/simd-info-demo/_index.md new file mode 100644 index 0000000000..cedd43e124 --- /dev/null +++ b/content/learning-paths/cross-platform/simd-info-demo/_index.md @@ -0,0 +1,46 @@ +--- +title: Introduction to SIMD.info + +draft: true +cascade: + draft: true + +minutes_to_complete: 30 + +who_is_this_for: This is for advanced topic for software developers interested in porting SIMD code across Arm platforms. + +learning_objectives: + - Learn how to use SIMD.info’s tools and features, such as navigation, search, and comparison, to simplify the process of finding equivalent SIMD intrinsics between architectures and improving code portability. + +prerequisites: + - A basic understanding of SIMD. + - Access to an Arm platform with SIMD supported engine, with recent versions of a C compiler (Clang or GCC) installed. + +author_primary: Georgios Mermigkis & Konstantinos Margaritis, VectorCamp + +### Tags +skilllevels: Advanced +subjects: Performance and Architecture +armips: + - Aarch64 + - Armv8-a + - Armv9-a +tools_software_languages: + - GCC + - Clang + - Coding + - Rust +operatingsystems: + - Linux +shared_path: true +shared_between: + - laptops-and-desktops + - servers-and-cloud-computing + - smartphones-and-mobile + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/cross-platform/simd-info-demo/_next-steps.md b/content/learning-paths/cross-platform/simd-info-demo/_next-steps.md new file mode 100644 index 0000000000..320c29c6e1 --- /dev/null +++ b/content/learning-paths/cross-platform/simd-info-demo/_next-steps.md @@ -0,0 +1,19 @@ +--- +next_step_guidance: You should explore **SIMD.info** more and find out porting opportunities between different SIMD engines. + +recommended_path: /learning-paths/cross-platform/ + +further_reading: + - resource: + title: SIMD.info + link: https://simd.info + type: website + + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +weight: 21 # set to always be larger than the content in this path, and one more than 'review' +title: "Next Steps" # Always the same +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/cross-platform/simd-info-demo/_review.md b/content/learning-paths/cross-platform/simd-info-demo/_review.md new file mode 100644 index 0000000000..cc6a2a64d0 --- /dev/null +++ b/content/learning-paths/cross-platform/simd-info-demo/_review.md @@ -0,0 +1,44 @@ +--- +review: + - questions: + question: > + What is SIMD.info? + answers: + - An online resource for SIMD C intrinsics for all major architectures + - It's an online forum for SIMD developers + - A book about SIMD programming + correct_answer: 1 + explanation: > + While it allows comments in the SIMD intrinsics, SIMD.info is not really a forum. It is an online **free** resource to assist developers porting C code between popular architectures, for example, from SSE/AVX/AVX512 to Arm ASIMD. + + - questions: + question: > + What architectures are listed in SIMD.info? + answers: + - Intel SSE and Arm ASIMD + - Power VSX and Arm ASIMD/SVE + - Intel SSE4.2/AVX/AVX2/AVX512, Arm ASIMD, Power VSX + correct_answer: 3 + explanation: > + At the time of writing SIMD.info supports Intel SSE4.2/AVX/AVX2/AVX512, Arm ASIMD, Power VSX as SIMD architectures. Work is in progress to include Arm SVE/SVE2, MIPS MSA, RISC-V RVV 1.0, s390 Z and others. + + - questions: + question: > + What are SIMD.info's major features? + answers: + - Hierarchical tree, Search, AI code translation + - Search, Hierarchical tree, Code examples + - Hierarchical tree, Search, Intrinsics Comparison, Code examples, Equivalents mapping, links to official documentation + correct_answer: 3 + explanation: > + SIMD.info provides multiple features, including a hierarchical tree, Search facility, Intrinsics Comparison, Code examples, Equivalents mapping, links to official documentation and others. AI code translation is not a feature of SIMD.info but will be the focus of another project, SIMD.ai. + + + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +title: "Review" # Always the same title +weight: 20 # Set to always be larger than the content in this path +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/cross-platform/simd-info-demo/conclusion.md b/content/learning-paths/cross-platform/simd-info-demo/conclusion.md new file mode 100644 index 0000000000..bf30963645 --- /dev/null +++ b/content/learning-paths/cross-platform/simd-info-demo/conclusion.md @@ -0,0 +1,17 @@ +--- +title: Conclusion +weight: 8 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +### Conclusion and Additional Resources + +Porting SIMD code between architecture can be a daunting process, in many cases requiring many hours of studying multiple ISAs in online resources or ISA manuals of thousands pages. Our primary focus in this work was to optimize the existing algorithm directly with SIMD intrinsics, without altering the algorithm or data layout. While reordering data to align with native Arm instructions could offer performance benefits, our scope remained within the constraints of the current data layout and algorithm. For those interested in data layout strategies to further enhance performance on Arm, the [vectorization-friendly data layout learning path](https://learn.arm.com/learning-paths/cross-platform/vectorization-friendly-data-layout/) offers valuable insights. + +Using **[SIMD.info](https://simd.info)** can be be instrumental in reducing the amount of time spent in this process, providing a centralized and user-friendly resource for finding **NEON** equivalents to intrinsics of other architectures. It saves considerable time and effort by offering detailed descriptions, prototypes, and comparisons directly, eliminating the need for extensive web searches and manual lookups. + +While porting between vectors of different sizes is more complex, work is underway -at the time of writing this guide- to complete integration of **SVE**/**SVE2** Arm extensions and allow matching them with **AVX512** intrinsics, as they are both using predicate masks. + +Please check **[SIMD.info](https://simd.info)** regularly for updates on this. diff --git a/content/learning-paths/cross-platform/simd-info-demo/intro-to-simdinfo.md b/content/learning-paths/cross-platform/simd-info-demo/intro-to-simdinfo.md new file mode 100644 index 0000000000..24df6cce42 --- /dev/null +++ b/content/learning-paths/cross-platform/simd-info-demo/intro-to-simdinfo.md @@ -0,0 +1,16 @@ +--- +title: Overview +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +### The Challenge of SIMD Code Portability +One of the biggest challenges developers face when working with SIMD code is making it portable across different platforms. SIMD instructions are designed to increase performance by executing the same operation on multiple data elements in parallel. However, each architecture has its own set of SIMD instructions, making it difficult to write code that works on all of them without major changes to the code and/or algorithm. + +To port software written using Intel intrinsics, like SSE/AVX/AVX512, to Arm Neon, you have pay attention to data handling with the different instruction sets. + +Having to port the code between architectures can increase development time and introduce the risk of errors during the porting process. Currently, developers rely on ISA documentation and manually search across various vendor platforms like [Arm Developer](https://developer.arm.com/architectures/instruction-sets/intrinsics/) and [Intel Intrinsics Guide](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html) to find equivalent instructions. + +[SIMD.info](https://simd.info) aims to solve this by helping you find equivalent instructions and providing a more streamlined way to adapt your code for different architectures. diff --git a/content/learning-paths/cross-platform/simd-info-demo/simdinfo-description.md b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-description.md new file mode 100644 index 0000000000..678d08327c --- /dev/null +++ b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-description.md @@ -0,0 +1,53 @@ +--- +title: SIMD.info Features +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +### Comprehensive SIMD.info Capabilities +**[SIMD.info](https://simd.info/)** offers a variety of powerful tools to help developers work more efficiently with SIMD code across different architectures. With a database of over 10,000 intrinsics, it provides detailed information to support effective SIMD development. + +For each intrinsic, SIMD.info provides comprehensive details, including: + +1. **Purpose**: A brief description of what the intrinsic does and its primary use case. +2. **Result**: Explanation of the output or result of the intrinsic. +3. **Example**: A code snippet demonstrating how to use the intrinsic. +4. **Prototypes**: Function prototypes for different programming languages (currently C/C++). +5. **Assembly Instruction**: The corresponding assembly instruction used by the intrinsic. +6. **Notes**: Any additional notes or caveats about the intrinsic. +7. **Architecture**: List of architectures that support the intrinsic +8. **Link(s) to Official Documentation** + +This detailed information ensures you have all the necessary resources to effectively use and port SIMD instructions across different platforms. Each feature is designed to simplify navigation, improve the search for equivalent instructions, and foster a collaborative environment for knowledge-sharing. + +- **Tree-based navigation:** **SIMD.info** uses a clear, hierarchical layout to organize instructions. It categorizes instructions into broad groups like **Arithmetic**, which are further divided into specific subcategories such as **Vector Add** and **Vector Subtract**. This organized structure makes it straightforward to browse through SIMD instruction sets across various platforms, allowing you to efficiently find and access the exact instructions you need. +An example of how the tree structure looks like: + + + - **Arithmetic** + - **Arithmetic (Complex Numbers)** + - **Boolean Logic & Bit Manipulation** + - **Boolean AND** + - **Boolean AND NOT** + - **Boolean AND NOT 128-bit vector** + - **Boolean AND NOT 16-bit signed integers** + - **Boolean AND NOT 16-bit unsigned integers** + - **Boolean AND NOT 256-bit vector** + - **Boolean AND NOT 32-bit floats** + - **Boolean AND NOT 32-bit signed integers** + - AVX512: mm512_andnot_epi32 + - NEON: vbic_s32 + - NEON: vbicq_s32 + - VSX: vec_andc + - **Bit Clear** + - **XOR** + +- **Advanced search functionality:** With its robust search engine, **SIMD.info** allows you to either search for a specific intrinsic (e.g. `vaddq_f64`) or enter more general terms (e.g. *How to add 2 vectors*), and it will return a list of the corresponding intrinsics. You can also filter results based on the specific engine you're working with, such as **NEON**, **SSE4.2**, **AVX**, **AVX512**, **VSX**. This functionality streamlines the process of finding the right commands tailored to your needs. + +- **Comparison tools:** This feature lets you directly compare SIMD instructions from different (or the same) platforms side by side, offering a clear view of the similarities and differences. It’s a very helpful tool for porting code across architectures, as it ensures accuracy and efficiency. + +- **Discussion forum (like StackOverflow):** The integrated discussion forum, powered by **[discuss](https://disqus.com/)** allows users to ask questions, share insights, and troubleshoot problems together. This community-driven space ensures that you’re never stuck on a complex issue without support, fostering collaboration and knowledge-sharing among SIMD developers. Imagine something like **StackOverflow** but specific to SIMD intrinsics. + +You can now learn how to use these features in the context of an actual example. diff --git a/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-cont.md b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-cont.md new file mode 100644 index 0000000000..6a8e1c4463 --- /dev/null +++ b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-cont.md @@ -0,0 +1,33 @@ +--- +title: Porting Process +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +### Using SIMD.info to find NEON Equivalents +Now that you have a clear view of the example, you can start the process of porting the code to Arm **Neon/ASIMD**. + +This is where [SIMD.info](https://simd.info/) comes in. + +In SIMD programming, the primary concern is the integrity and accuracy of the calculations. Ensuring that these calculations are done correctly is crucial. Performance almost always comes second. + +For the operations in your **SSE4.2** example, you have the following intrinsics: + +- **`_mm_cmpgt_ps`** +- **`_mm_add_ps`** +- **`_mm_mul_ps`** +- **`_mm_sqrt_ps`** + +To gain a deeper understanding of how these intrinsics work and to get detailed descriptions, you can use the search feature on **SIMD.info**. Simply enter the intrinsic's name into the search bar. You can either select from the suggested results or perform a direct search to find detailed information about each intrinsic. + +1. By searching [**`_mm_add_ps`**](https://simd.info/c_intrinsic/_mm_add_ps/) you get information about it's purpose, result-type, assembly instruction, prototype and an example about it. By clicking the **engine** option **"NEON"** you can find it's [equivalents](https://simd.info/eq/_mm_add_ps/NEON/) for this engine. The equivalents are: **`vaddq_f32`**, **`vadd_f32`**. [Intrinsics comparison](https://simd.info/c-intrinsics-compare?compare=vaddq_f32:vadd_f32) will help you find the right one. Based on the prototype provided, you would choose [**`vaddq_f32`**](https://simd.info/c_intrinsic/vaddq_f32/) because it works with 128-bit vectors which is the same as **SSE4.2**. + +2. Moving to the next intrinsic, **`_mm_mul_ps`**, you will use the [Intrinsics Tree](https://simd.info/tag-tree) on **SIMD.info** to find the equivalent. Start by expanding the **Arithmetic** branch and then navigate to the branch **Vector Multiply**. Since you are working with 32-bit floats, open the **Vector Multiply 32-bit floats** branch, where you will find several options. The recommended choice is [**`vmulq_f32`**](https://simd.info/c_intrinsic/vmulq_f32/), following the same reasoning as before—it operates on 128-bit vectors. + +3. For the third intrinsic, **`_mm_sqrt_ps`**, the easiest way to find the corresponding **NEON** intrinsic is by typing **"Square Root"** into the search bar on SIMD.info. From the [search results](https://simd.info/search?search=Square+Root&simd_engines=1&simd_engines=2&simd_engines=3&simd_engines=4&simd_engines=5), look for the float-specific version and select [**`vsqrtq_f32`**](https://simd.info/c_intrinsic/vsqrtq_f32/), which, like the others, works with 128-bit vectors. In the equivalents section regarding **SSE4.2**, you can clearly see that **`_mm_sqrt_ps`** has its place as a direct match for this operation. + +4. For the last intrinsic, **`_mm_cmpgt_ps`**, follow a similar approach as before. Inside the intrinsics tree, start by expanding the **Comparison** folder. Navigate to the subfolder **Vector Compare Greater Than**, and since you are working with 32-bit floats, proceed to **Vector Compare Greater Than 32-bit floats**. The recommended choice is again the 128-bit variant [**`vcgtq_f32`**](https://simd.info/c_intrinsic/vcgtq_f32/). + +Now that you have found the **NEON** equivalents for each **SSE4.2** intrinsic, you're ready to begin porting the code. Understanding these equivalents is key to ensuring that the code produces the correct results in the calculations as you switch between SIMD engines. diff --git a/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-porting.md b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-porting.md new file mode 100644 index 0000000000..f0a2d3f5bb --- /dev/null +++ b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1-porting.md @@ -0,0 +1,99 @@ +--- +title: Code Verification +weight: 6 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +### Step-by-Step Porting + +1. Change the loading process to follow **NEON**'s method for initializing vectors. The **SSE4.2** intrinsic **`_mm_set_ps`** is in reality a macro, in **NEON** you can do the same thing with curly braces **`{}`** inititialization. +2. Next, you will replace the **SSE4.2** intrinsics with the **NEON** equivalents we identified earlier. The key is to ensure that the operations perform the same tasks, such as comparison, addition, multiplication, and square root calculations. +3. Finally, modify the storing process to match **NEON**’s way of moving data from vectors to memory. In **NEON**, you use functions like [**`vst1q_f32`**](https://simd.info/c_intrinsic/vst1q_f32/) for storing 128-bit floating-point vectors and [**`vst1q_u32`**](https://simd.info/c_intrinsic/vst1q_u32/) for storing 128-bit integer vectors. + +After identifying the **NEON** intrinsics you will need in the ported program, it's time to actually write the code. + +This time on your Arm Linux machine, create a new file for the ported NEON code named `calculation_neon.c` with the contents shown below: + +```C +#include +#include + +float32_t a_array[4] = {1.0f, 4.0f, 9.0f, 16.0f}; +float32_t b_array[4] = {1.0f, 2.0f, 3.0f, 4.0f}; + +int main() { + float32x4_t a = vld1q_f32(a_array); + float32x4_t b = vld1q_f32(b_array); + + uint32x4_t cmp_result = vcgtq_f32(a, b); + + float a_arr[4], b_arr[4]; + uint32_t cmp_res[4]; + + vst1q_f32(a_arr, a); + vst1q_f32(b_arr, b); + vst1q_u32(cmp_res, cmp_result); + + for (int i = 0; i < 4; i++) { + if (cmp_res[i] != 0) { + printf("Element %d: %.2f is larger than %.2f\n", i, a_arr[i], b_arr[i]); + } else { + printf("Element %d: %.2f is not larger than %.2f\n", i, a_arr[i], b_arr[i]); + } + } + printf("\n"); + + float32x4_t add_result = vaddq_f32(a, b); + float32x4_t mul_result = vmulq_f32(add_result, b); + float32x4_t sqrt_result = vsqrtq_f32(mul_result); + + float res[4]; + + vst1q_f32(res, add_result); + printf("Addition Result: %.2f %.2f %.2f %.2f\n", res[0], res[1], res[2], res[3]); + + vst1q_f32(res, mul_result); + printf("Multiplication Result: %.2f %.2f %.2f %.2f\n", res[0], res[1], res[2], res[3]); + + vst1q_f32(res, sqrt_result); + printf("Square Root Result: %.2f %.2f %.2f %.2f\n", res[0], res[1], res[2], res[3]); + + return 0; +} +``` + +### Verifying the Ported Code + +It's time to verify that the functionality remains the same, which means you get the same results and similar performance. + +Compile the above code as follows on your Arm Linux machine: + +```bash +gcc -O3 calculation_neon.c -o calculation_neon +``` + +Now run the program: +```bash +./calculation_neon +``` + +The output should look like the following: + +```output +Element 0: 1.00 is not larger than 1.00 +Element 1: 4.00 is larger than 2.00 +Element 2: 9.00 is larger than 3.00 +Element 3: 16.00 is larger than 4.00 + +Addition Result: 2.00 6.00 12.00 20.00 +Multiplication Result: 2.00 12.00 36.00 80.00 +Square Root Result: 1.41 3.46 6.00 8.94 +``` + +You can see that the results are the same as in the **SSE4.2** example. + +{{% notice Note %}} +You initialized the vectors in reverse order compared to the **SSE4.2** version because the array initialization and vld1q_f32 function load vectors from LSB to MSB, whereas **`_mm_set_ps`** loads elements MSB to LSB. +{{% /notice %}} diff --git a/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1.md b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1.md new file mode 100644 index 0000000000..be115692d2 --- /dev/null +++ b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example1.md @@ -0,0 +1,81 @@ +--- +title: Example Program +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +Consider the following C example that uses Intel SSE4.2 intrinsics. + +On an x86_64 Linux development machine, create a file named `calculation_sse.c` with the contents shown below: + +```C +#include +#include + +int main() { + __m128 a = _mm_set_ps(16.0f, 9.0f, 4.0f, 1.0f); + __m128 b = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f); + + __m128 cmp_result = _mm_cmpgt_ps(a, b); + + float a_arr[4], b_arr[4], cmp_arr[4]; + _mm_storeu_ps(a_arr, a); + _mm_storeu_ps(b_arr, b); + _mm_storeu_ps(cmp_arr, cmp_result); + + for (int i = 0; i < 4; i++) { + if (cmp_arr[i] != 0.0f) { + printf("Element %d: %.2f is larger than %.2f\n", i, a_arr[i], b_arr[i]); + } else { + printf("Element %d: %.2f is not larger than %.2f\n", i, a_arr[i], b_arr[i]); + } + } + + __m128 add_result = _mm_add_ps(a, b); + __m128 mul_result = _mm_mul_ps(add_result, b); + __m128 sqrt_result = _mm_sqrt_ps(mul_result); + + float res[4]; + + _mm_storeu_ps(res, add_result); + printf("Addition Result: %f %f %f %f\n", res[0], res[1], res[2], res[3]); + + _mm_storeu_ps(res, mul_result); + printf("Multiplication Result: %f %f %f %f\n", res[0], res[1], res[2], res[3]); + + _mm_storeu_ps(res, sqrt_result); + printf("Square Root Result: %f %f %f %f\n", res[0], res[1], res[2], res[3]); + + return 0; +} +``` + +The program first compares whether elements in one vector are greater than those in another vector, prints the result, and then proceeds to compute the addition of two vectors, multiplies the result with one of the vectors, and finally takes the square root of the multiplication result: + +Compile the code on your Linux x86_64 system that supports **SSE4.2**: + +```bash +gcc -O3 calculation_sse.c -o calculation_sse -msse4.2 +``` + +Now run the program: + +```bash +./calculation_sse +``` + +The output should look like the following: +```output +Element 0: 1.00 is not larger than 1.00 +Element 1: 4.00 is larger than 2.00 +Element 2: 9.00 is larger than 3.00 +Element 3: 16.00 is larger than 4.00 + +Addition Result: 2.00 6.00 12.00 20.00 +Multiplication Result: 2.00 12.00 36.00 80.00 +Square Root Result: 1.41 3.46 6.00 8.94 +``` + +It is imperative that you run the code first on an Intel x86_64 reference platform, to make sure you understand how it works and what kind of results are being expected. diff --git a/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example2.md b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example2.md new file mode 100644 index 0000000000..32793cf3c0 --- /dev/null +++ b/content/learning-paths/cross-platform/simd-info-demo/simdinfo-example2.md @@ -0,0 +1,132 @@ +--- +title: Intrinsics without Equivalents +weight: 7 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +### Handling intrinsics without direct equivalents + +During the porting process, you will observe that certain instructions translate seamlessly. However, there are cases where direct equivalents for some intrinsics may not be readily available across architectures. For example, the [**`_mm_madd_epi16`**](https://simd.info/c_intrinsic/_mm_madd_epi16/) intrinsic from **SSE2**, which performs multiplication of 16-bit signed integer elements in a vector and then does a pairwise addition of adjacent elements increasing the element width, does not have a direct counterpart in **NEON**. However it can be emulated using another intrinsic. Similarly its 256 and 512-bit counterparts, [**`_mm256_madd_epi16`**](https://simd.info/c_intrinsic/_mm256_madd_epi16/) and [**`_mm512_madd_epi16`**](https://simd.info/c_intrinsic/_mm512_madd_epi16/) can be emulated by a sequence of instructions, but here you will see the 128-bit variant. + +You may already know the equivalent operations for this particular intrinsic, but let's assume you don't. In this usecase, reading the **`_mm_madd_epi16`** on the **SIMD.info** might indicate that a key characteristic of the instruction involved is the *widening* of the result elements, from 16-bit to 32-bit signed integers. Unfortunately, that is not the case, as this particular instruction does not actually increase the size of the element holding the result values. You will see how that effects the result in the example. + +Consider the following code for **SSE2**. Create a new file on your x86_64 Linux machine named `_mm_madd_epi16_test.c` with the contents shown below: + +```C +#include +#include +#include + +void print_s16x8(char *label, __m128i v) { + int16_t out[8]; + _mm_storeu_si128((__m128i*)out, v); + printf("%-*s: ", 30, label); + for (size_t i=0; i < 8; i++) printf("%4x ", (uint16_t)out[i]); + printf("\n"); +} + +int main() { + __m128i a = _mm_set_epi16(10, 30, 50, 70, 90, 110, 130, 150); + __m128i b = _mm_set_epi16(20, 40, 60, 80, 100, 120, 140, 160); + // 130 * 140 = 18200, 150 * 160 = 24000 + // adding them as 32-bit signed integers -> 42000 + // adding them as 16-bit signed integers -> -23336 (overflow!) + + __m128i res = _mm_madd_epi16(a, b); + + print_s16x8("a", a); + print_s16x8("b", b); + print_s16x8("_mm_madd_epi16(a, b)", res); + + return 0; +} +``` + +Compile the code as follows on the x86_64 system (no extra flags required as **SSE2** is assumed by default on all 64-bit x86 systems): +```bash +gcc -O3 _mm_madd_epi16_test.c -o _mm_madd_epi16_test +``` + +Now run the program: +```bash +./_mm_madd_epi16_test +``` + +The output should look like: +```output +a : 96 82 6e 5a 46 32 1e a +b : a0 8c 78 64 50 3c 28 14 +_mm_madd_epi16(a, b) : a4d8 0 56b8 0 2198 0 578 0 +``` + +You will note that the result of the first element is a negative number, even though we added 2 positive results (`130*140` and `150*160`). That is because the result of the addition has to occupy a 16-bit signed integer element and when the first is larger we have the effect of an negative overflow. The result is the same in binary arithmetic, but when interpreted into a signed integer, it turns the number into a negative. + +The rest of the values are as expected. Notice how each pair has a zero element next to it. The results are correct, but they are not in the correct order. In this example, you used **`vmovl`** to zero-extend values, which achieves the correct order with zero elements in place. While both **`vmovl`** and **`zip`** could be used for this purpose, **`vmovl`** was chosen in this implementation. For more details, see the Arm Software Optimization Guides, such as the [Neoverse V2 guide](https://developer.arm.com/documentation/109898/latest/). + +Now switch your Linux Arm machine and create a file called `_mm_madd_epi16_neon.c` with the contents below: +```C +#include +#include +#include + +void print_s16x8(char *label, int16x8_t v) { + int16_t out[8]; + vst1q_s16(out, v); + printf("%-*s: ", 30, label); + for (size_t i = 0; i < 8; i++) printf("%4x ", (uint16_t)out[i]); + printf("\n"); +} + +int16_t a_array[8] = {150, 130, 110, 90, 70, 50, 30, 10}; +int16_t b_array[8] = {160, 140, 120, 100, 80, 60, 40, 20}; + +int main() { + int16x8_t a = vld1q_s16(a_array); + int16x8_t b = vld1q_s16(b_array); + int16x8_t zero = vdupq_n_s16(0); + // 130 * 140 = 18200, 150 * 160 = 24000 + // adding them as 32-bit signed integers -> 42000 + // adding them as 16-bit signed integers -> -23336 (overflow!) + + int16x8_t res = vmulq_s16(a, b); + + print_s16x8("a", a); + print_s16x8("b", b); + print_s16x8("vmulq_s16(a, b)", res); + res = vpaddq_s16(res, zero); + print_s16x8("vpaddq_s16(a, b)", res); + + // vmovl_s16 would sign-extend; we just want to zero-extend + // so we need to cast to uint16, vmovl_u16 and then cast back to int16 + uint16x4_t res_u16 = vget_low_u16(vreinterpretq_u16_s16(res)); + res = vreinterpretq_s16_u32(vmovl_u16(res_u16)); + print_s16x8("final", res); + + return 0; +} +``` + +Compile the code on your Arm Linux machine: + +```bash +gcc -O3 _mm_madd_epi16_neon.c -o _mm_madd_epi16_neon +``` + +Now run the program: +```bash +./_mm_madd_epi16_neon.c +``` + +The output should look like: +```output +a : 96 82 6e 5a 46 32 1e a +b : a0 8c 78 64 50 3c 28 14 +vmulq_s16(a, b) : 5dc0 4718 3390 2328 15e0 bb8 4b0 c8 +vpaddq_s16(a, b) : a4d8 56b8 2198 578 0 0 0 0 +final : a4d8 0 56b8 0 2198 0 578 0 +``` + +As you can see the results of both executions on different architectures match. You were able to use **SIMD.info** to help with the translation of complex intrinsics between different SIMD architectures. + diff --git a/content/learning-paths/laptops-and-desktops/_index.md b/content/learning-paths/laptops-and-desktops/_index.md index 8672a12ccd..7f6fe66d59 100644 --- a/content/learning-paths/laptops-and-desktops/_index.md +++ b/content/learning-paths/laptops-and-desktops/_index.md @@ -11,14 +11,14 @@ operatingsystems_filter: - Android: 2 - Baremetal: 1 - ChromeOS: 1 -- Linux: 27 +- Linux: 29 - macOS: 7 -- Windows: 35 +- Windows: 36 subjects_filter: - CI-CD: 3 -- Containers and Virtualization: 5 +- Containers and Virtualization: 6 - Migration to Arm: 25 -- Performance and Architecture: 18 +- Performance and Architecture: 20 subtitle: Create and migrate apps for power efficient performance title: Laptops and Desktops tools_software_languages_filter: @@ -28,17 +28,18 @@ tools_software_languages_filter: - Arm Development Studio: 2 - Arm64EC: 1 - assembly: 1 +- Automotive: 1 - C: 2 - C#: 5 - C++: 2 - C/C++: 4 - CCA: 1 -- Clang: 8 +- Clang: 9 - CMake: 2 -- Coding: 18 +- Coding: 19 - CSS: 1 - Docker: 4 -- GCC: 8 +- GCC: 9 - GitHub: 2 - GitLab: 1 - GoogleTest: 1 @@ -55,12 +56,12 @@ tools_software_languages_filter: - Neon: 1 - Neovim: 1 - Node.js: 3 -- perf: 1 +- perf: 2 - Python: 2 - Qt: 2 - Remote.It: 1 - RME: 1 -- Rust: 1 +- Rust: 2 - SVE: 1 - SVE2: 1 - Trusted Firmware: 1 @@ -68,9 +69,10 @@ tools_software_languages_filter: - Visual Studio Code: 9 - VS Code: 2 - Windows Forms: 1 +- Windows Performance Analyzer: 1 - Windows Presentation Foundation: 1 - Windows Sandbox: 1 -- WindowsPerf: 2 +- WindowsPerf: 3 - WinUI 3: 1 - WSL: 1 - Xamarin Forms: 1 diff --git a/content/learning-paths/laptops-and-desktops/system76-auto/_index.md b/content/learning-paths/laptops-and-desktops/system76-auto/_index.md new file mode 100644 index 0000000000..001d278dd6 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/system76-auto/_index.md @@ -0,0 +1,32 @@ +--- +title: Develop Arm automotive software on the System76 Thelio Astra + +minutes_to_complete: 60 + +who_is_this_for: This is an introductory topic for automotive developers interested in local development using the System76 Thelio Astra Linux desktop computer. + +learning_objectives: + - Create an efficient automotive development environment on the System76 Thelio Astra desktop. + - Build and run the Arm Automotive Solutions Software Reference Stack locally. + +prerequisites: + - A System76 Thelio Astra desktop computer running Ubuntu 24.04. + +author_primary: Jason Andrews + +### Tags +skilllevels: Introductory +subjects: Containers and Virtualization +armips: + - Neoverse +operatingsystems: + - Linux +tools_software_languages: + - Automotive + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/laptops-and-desktops/system76-auto/_next-steps.md b/content/learning-paths/laptops-and-desktops/system76-auto/_next-steps.md new file mode 100644 index 0000000000..05846422f7 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/system76-auto/_next-steps.md @@ -0,0 +1,27 @@ +--- +# ================================================================================ +# Edit +# ================================================================================ + +next_step_guidance: > + You have successfully learned how to build and run the Arm Automotive Solutions Software Reference Stack on the System76 Thelio Astra. + +recommended_path: "/learning-paths/cross-platform/docker-build-cloud/" + +further_reading: + - resource: + title: Arm Automotive Solutions Documentation + link: https://arm-auto-solutions.docs.arm.com/en/v1.1/index.html + type: documentation + - resource: + title: Parsec + link: https://parsec.community/ + type: documentation + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +weight: 21 # set to always be larger than the content in this path, and one more than 'review' +title: "Next Steps" # Always the same +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/laptops-and-desktops/system76-auto/_review.md b/content/learning-paths/laptops-and-desktops/system76-auto/_review.md new file mode 100644 index 0000000000..cdac780920 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/system76-auto/_review.md @@ -0,0 +1,53 @@ +--- +# ================================================================================ +# Edit +# ================================================================================ + +# Always 3 questions. Should try to test the reader's knowledge, and reinforce the key points you want them to remember. + # question: A one sentence question + # answers: The correct answers (from 2-4 answer options only). Should be surrounded by quotes. + # correct_answer: An integer indicating what answer is correct (index starts from 0) + # explanation: A short (1-3 sentence) explanation of why the correct answer is correct. Can add additional context if desired + + +review: + - questions: + question: > + To ensure there is efficient memory resource, the Arm Automotive Solutions Software Reference Stack must be built on an Arm cloud instance. + answers: + - "True." + - "False." + correct_answer: 2 + explanation: > + You can build the automotive software stack on a local machine using the System76 Thelio Astra Linux desktop. + - questions: + question: > + Which of the following are benefits of Parsec? + answers: + - "Platform Agnostic API." + - "Secure boot and attestation." + - "Key management and cryptography." + - "All of the above." + correct_answer: 4 + explanation: > + All of these help Parsec provide unified access to hardware security. + + - questions: + question: > + Which of these are the benefits of using Arm-based desktops for Arm-based software development? + answers: + - "ISA compatibility." + - "No cross-compilation or Virtualization needed." + - "Lower cost and higher performance." + - "All of the above." + correct_answer: 4 + explanation: > + Using Arm-based desktops for Arm-based software development brings all these benefits. + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +title: "Review" # Always the same title +weight: 20 # Set to always be larger than the content in this path +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/laptops-and-desktops/system76-auto/about.md b/content/learning-paths/laptops-and-desktops/system76-auto/about.md new file mode 100644 index 0000000000..a15160bdcb --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/system76-auto/about.md @@ -0,0 +1,27 @@ +--- +# User change +title: "Thelio Astra" + +weight: 2 + +layout: "learningpathall" +--- +### About the Thelio Astra + +Thelio Astra is an Arm-based desktop computer designed by System76 for autonomous vehicle development and other general-purpose Arm software development. + +Thelio Astra uses the Ampere Altra processor, which is based on the Arm Neoverse N1 CPU, and ships with the Ubuntu operating system. + +An NVIDIA GPU is included for high performance graphics, and the system can be configured with up to 512 GB of RAM and up to 16 TB of storage. + +Some of the benefits of using a Thelio Astra for automotive development include: + +- Access to native performance - you can execute build and test cycles directly on Arm Neoverse processors, eliminating the performance overhead and complexities associated with instruction emulation and cross-compilation. + +- Improved virtualization - familiar virtualization and container tools on Arm simplify the development and test process. + +- Better cost-effectiveness - there are cost savings of having a local computer with a high core count, large memory, and plenty of storage. + +- Enhanced compatibility - support for Arm CPUs and NVIDIA GPUs eliminates the need for Arm instruction emulation, which simplifies the developer process and overall experience. + +- Optimized developer process - the development process can be optimized by enabling you to run large software stacks on your local machine, making it easier to fix issues and improve performance. \ No newline at end of file diff --git a/content/learning-paths/laptops-and-desktops/system76-auto/about2.md b/content/learning-paths/laptops-and-desktops/system76-auto/about2.md new file mode 100644 index 0000000000..487e0e66e2 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/system76-auto/about2.md @@ -0,0 +1,27 @@ +--- +# User change +title: "Arm Automotive Solutions Software Reference Stack" + +weight: 4 + +layout: "learningpathall" +--- + +## About Arm Automotive Solutions Software Reference Stack, and Arm Reference Design-1 AE + +The Arm Automotive Solutions Software Reference Stack can be run on the Arm Reference Design-1 AE (RD-1 AE). + +RD-1 AE is a concept hardware design, for the Automotive segment, and built on the Neoverse V3AE Application Processor as primary compute, and augmented with an Arm Cortex-R82AE based Safety Island. + +The system additionally includes a Runtime Security Engine (RSE) used for the secure boot of the system elements and runtime Secure Services. + +The software stack consists of: + +* Firmware. +* Boot loader. +* Hypervisor. +* Linux kernel. +* File system. +* Applications. + +The development environment uses the Yocto Project build framework. \ No newline at end of file diff --git a/content/learning-paths/laptops-and-desktops/system76-auto/about3.md b/content/learning-paths/laptops-and-desktops/system76-auto/about3.md new file mode 100644 index 0000000000..cd1578f612 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/system76-auto/about3.md @@ -0,0 +1,21 @@ +--- +# User change +title: "Parsec" + +weight: 6 + +layout: "learningpathall" +--- + + +## About Parsec + +There are a number of example applications which demonstrate the software stack running the reference hardware system modeled by a Fixed Virtual Platform (FVP). The Parsec demo is explained below. + +The [Parsec-enabled TLS demo](https://arm-auto-solutions.docs.arm.com/en/v1.1/design/applications/parsec_enabled_tls.html) illustrates a HTTPS session. A simple web page is transferred using a Transport Layer Security (TLS) connection. + +Parsec, or Platform AbstRaction for SECurity, is an open-source initiative that provides a common API to hardware security and cryptographic services. + +This enables applications to interact with the secure hardware of a device without requiring knowledge of the specific details of the hardware itself. The Parsec abstraction layer makes it easier to develop secure applications that can run on different devices and platforms.   + +You can follow the instructions in the next section to run the Parsec demo. \ No newline at end of file diff --git a/content/learning-paths/laptops-and-desktops/system76-auto/build.md b/content/learning-paths/laptops-and-desktops/system76-auto/build.md new file mode 100644 index 0000000000..98d114d554 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/system76-auto/build.md @@ -0,0 +1,43 @@ +--- +# User change +title: "Build the Arm Automotive Solutions Software Reference Stack" + +weight: 5 + +layout: "learningpathall" +--- +## Build the Automotive Software Stack + +The Thelio Astra makes it possible to build the complete software stack on an Arm-based local machine instead of using a non-Arm desktop computer, with the ISA compatibility benefits that it brings to Arm-based software development. Using Arm-based cloud instances is another option. + +You can build the Using the Arm Automotive Solutions Software Reference Stack from the command line of the Ubuntu 20.04 Multipass virtual machine. + +Start by creating a new directory, and then clone the repository: + +```console +mkdir -p ~/arm-auto-solutions +cd ~/arm-auto-solutions +git clone https://git.gitlab.arm.com/automotive-and-industrial/arm-auto-solutions/sw-ref-stack.git --branch v1.1 +``` + +Open the configuration menu: + +```console +kas menu sw-ref-stack/Kconfig +``` + +Press the space bar and the arrow keys to select the three components shown in the screen capture below: +- **Accept the END USER LICENSE AGREEMENT FOR ARM ECOSYSTEM MODELS**. +- **Safety Island Actuation Demo**. +- **Baremetal**. + +![configuration #center](configure.png) + +{{% notice Note %}} +To build and run, you must accept the EULA. +{{% /notice %}} + +Press tab to navigate to **Build** and press Enter to start the build. + +The build will take some time, depending on the number of CPUs in your virtual machine. + diff --git a/content/learning-paths/laptops-and-desktops/system76-auto/configure.png b/content/learning-paths/laptops-and-desktops/system76-auto/configure.png new file mode 100644 index 0000000000..6bf63fccb5 Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/system76-auto/configure.png differ diff --git a/content/learning-paths/laptops-and-desktops/system76-auto/run.md b/content/learning-paths/laptops-and-desktops/system76-auto/run.md new file mode 100644 index 0000000000..f7ce6c53e6 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/system76-auto/run.md @@ -0,0 +1,128 @@ +--- +# User change +title: "Run the Parsec demo" + +weight: 7 + +layout: "learningpathall" +--- + +## Run the Parsec SSL demo + +From the command line, start a Tmux session: + +```console +tmux new-session -s arm-auto-solutions +``` + +Tmux makes it possible to connect to the output from multiple hardware subsystems in the reference design. + +To run the software stack on the FVP, run: + +```console +cd ~/arm-auto-solutions +kas shell -c "../layers/meta-arm/scripts/runfvp -t tmux --verbose" +``` + +This runs the entire software stack on a model of the hardware. + +Anytime during the process you can use Tmux to interact with the different subsystems using Ctrl+B then W to bring up a list of windows. Press the arrow keys to select a window. + +After the software boots, you should see a Linux login prompt: `fvp-rd-kronos login:` + +Enter `root` for the login name. No password is required. + +Make sure the initialization process is complete by running: + +```console +systemctl is-system-running --wait +``` + +If the output is `running`, continue to the next step. If not, rerun the command until the output is `running`. + +On the primary compute, run the SSL server: + +```console +ssl_server & +``` + +The output from the server is printed: + +```output + . Seeding the random number generator... ok + . Loading the server cert. and key... ok + . Bind on https://localhost:4433/ ... ok + . Setting up the SSL data.... ok + . Waiting for a remote connection ... +``` + +The SSL client runs in a standard Ubuntu 22.04 container and requests a web page from the SSL server. The client has been modified to use Parsec, making it more portable, and able to abstract the details of the hardware security services. + +Run the Parsec-enabled SSL client: + +```console +docker run --rm -v /run/parsec/parsec.sock:/run/parsec/parsec.sock -v /usr/bin/ssl_client1:/usr/bin/ssl_client1 --network host docker.io/library/ubuntu:22.04 ssl_client1 +``` + +The container will then download and run. The SSL client application named `ssl_client1` also starts running. + +The client application requests a webpage from the SSL server and you should see this output: + +```output + . Seeding the random number generator... ok + . Loading the CA root certificate ... ok (0 skipped) + . Connecting to tcp/localhost/4433... ok + . Performing the SSL/TLS handshake... ok + . Setting up the SSL/TLS structure... ok + . Performing the SSL/TLS handshake... ok + < Read from client: 18 bytes read + +GET / HTTP/1.0 + + > Write to client: ok + . Verifying peer X.509 certificate... ok + > Write to server: 156 bytes written + +HTTP/1.0 200 OK +Content-Type: text/html + +

Mbed TLS Test Server

+

Successful connection using: TLS-ECDHE-RSA-WITH-CHACHA20-POLY1305-SHA256

+ + . Closing the connection... ok + . Waiting for a remote connection ... 18 bytes written + +GET / HTTP/1.0 + + < Read from server: 156 bytes read + +HTTP/1.0 200 OK +Content-Type: text/html + +

Mbed TLS Test Server

+

Successful connection using: TLS-ECDHE-RSA-WITH-CHACHA20-POLY1305-SHA256

+``` + +## Shutdown and clean up + +You can shut down the simulated system by using the following command: + +```console +shutdown now +``` + +This will return you to the command line. + +Type `exit` to leave the Tmux session, and `exit` again to leave the Multipass virtual machine. + +To delete the Multipass VM, run the commands: + +```console +multipass stop u20-32 +multipass delete u20-32 +multipass purge +``` + +You have run the Parsec example from the Arm Automotive Solutions Software Reference Stack. + +There are many other example applications you can run. See the Further Reading section at the end of the Learning Path for more information. \ No newline at end of file diff --git a/content/learning-paths/laptops-and-desktops/system76-auto/setup.md b/content/learning-paths/laptops-and-desktops/system76-auto/setup.md new file mode 100644 index 0000000000..ac222c6e5f --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/system76-auto/setup.md @@ -0,0 +1,93 @@ +--- +# User change +title: "Set up an automotive development environment" + +weight: 3 + +layout: "learningpathall" +--- + +## Before you begin + +This Learning Path explains how to perform automotive software development using the [System76 Thelio Astra](https://system76.com/arm) Linux desktop computer running Ubuntu. + +Before you begin, install Multipass using the [Multipass install guide](/install-guides/multipass/) for Arm Linux. You can then use Multipass to create a cloud-style virtual machine on your desktop computer. + + + +## Create a virtual machine using Multipass + +A Multipass virtual machine is a good way to create the required automotive development environment and isolate the build and test process. Using Multipass also allows you to split the resources of the Thelio Astra and specify the number of CPUs and the portion of memory and storage for the development environment. It is also easy to delete an existing VM and create a new one whenever required. + +The Arm Automotive Solutions Software Reference Stack requires Ubuntu 20.04 for the build and test machine. The Thelio Astra ships with either Ubuntu 22.04 or 24.04. With Multipass, you can use Ubuntu 20.04 for the development environment, isolate the development from the native operating system, and avoid any compatibility issues. + +To get started, create a Multipass virtual machine named `u20-32` with Ubuntu 20.04 and 32 CPUs: + +```console +multipass launch 20.04 --name u20-32 --cpus 32 --disk 250G --memory 32G +``` +{{% notice Note %}} +You can adjust the configuration of the setup, by changing the number of CPUs, the allotted memory, and the disk space size. Using more of these resources will speed up the build process. + +This Learning Path documents an example using a Thelio Astra with 64 CPUs, 64 GB of RAM, and 1 TB of storage. +{{% /notice %}} + + +Start a bash shell in the Ubuntu 20.04 VM: + +```console +multipass shell u20-32 +``` + +Update the Ubuntu software: + +```console +sudo apt update ; sudo apt upgrade -y +``` + +### Create swap space + +Building the automotive software stack requires significant memory resources, so it's good practice to create swap space. Without swap space, some build processes might fail due to lack of memory. + +Create 10 GB of swap space: + +``` +sudo fallocate -l 10G /swapfile +sudo chmod 600 /swapfile +sudo mkswap /swapfile +sudo swapon /swapfile +``` + +To confirm the swap space has been created, run: + +```console +swapon --show +``` + +The output shows the swap space: + +```output +NAME TYPE SIZE USED PRIO +/swapfile file 10G 0B -2 +``` + +### Install the required development tools + +A number of development tools are required to build the Arm Automotive Solutions Software Reference Stack. + +Install the required software: + +```console +sudo apt install gawk wget git diffstat unzip texinfo gcc build-essential chrpath socat cpio python3 python3-pip python3-pexpect xz-utils debianutils iputils-ping python3-git python3-jinja2 python3-subunit zstd liblz4-tool file locales libacl1 -y +``` + +Configure the locale and install [Kas](https://kas.readthedocs.io/en/latest/index.html), a setup tool for [BitBake](https://docs.yoctoproject.org/bitbake/): + +```console +sudo locale-gen en_US.UTF-8 +sudo -H pip3 install --upgrade kas==4.3.2 && sudo apt install python3-newt -y +``` + +You now have a Multipass virtual machine running Ubuntu 20.04 with the required swap space and the required development tools installed. + +Proceed to the next section to build the Arm Automotive Solutions Software Reference Stack. diff --git a/content/learning-paths/laptops-and-desktops/windowsperf/_index.md b/content/learning-paths/laptops-and-desktops/windowsperf/_index.md index ccac2cc8cd..1d711eaffb 100644 --- a/content/learning-paths/laptops-and-desktops/windowsperf/_index.md +++ b/content/learning-paths/laptops-and-desktops/windowsperf/_index.md @@ -1,7 +1,7 @@ --- title: Get started with WindowsPerf -minutes_to_complete: 15 +minutes_to_complete: 20 who_is_this_for: This is an introductory topic for software developers working on laptops and desktops and new to the Arm architecture. @@ -19,6 +19,7 @@ skilllevels: Introductory subjects: Performance and Architecture armips: - Cortex-A + - Neoverse operatingsystems: - Windows tools_software_languages: diff --git a/content/learning-paths/laptops-and-desktops/windowsperf/_review.md b/content/learning-paths/laptops-and-desktops/windowsperf/_review.md index d0acf262a1..a4cf8ac1de 100644 --- a/content/learning-paths/laptops-and-desktops/windowsperf/_review.md +++ b/content/learning-paths/laptops-and-desktops/windowsperf/_review.md @@ -19,7 +19,7 @@ review: - "False" correct_answer: 1 explanation: > - The available counters may vary between processors. Use `wprof -l` to generate a list of available counters. + The available counters may vary between processors. Use `wperf list` to generate a list of available counters. - questions: question: > @@ -41,6 +41,26 @@ review: explanation: > Some `wperf` commands such as `list`, `test` or `stat` can output data in JSON format. + - questions: + question: > + Command `wperf sample` can be used together with `--annotate` or `--disassemble` command line options. + answers: + - "True" + - "False" + correct_answer: 1 + explanation: > + Yes, you can add annotate and disassemble output to `wperf sample` command. + + - questions: + question: > + Command `wperf record` can be used together with `--annotate` or `--disassemble` command line options. + answers: + - "True" + - "False" + correct_answer: 1 + explanation: > + Yes, you can add annotate and disassemble output to `wperf record` command. + # ================================================================================ # FIXED, DO NOT MODIFY diff --git a/content/learning-paths/laptops-and-desktops/windowsperf/windowsperf.md b/content/learning-paths/laptops-and-desktops/windowsperf/windowsperf.md index 463b5d7788..92254bf53a 100644 --- a/content/learning-paths/laptops-and-desktops/windowsperf/windowsperf.md +++ b/content/learning-paths/laptops-and-desktops/windowsperf/windowsperf.md @@ -6,19 +6,19 @@ weight: 2 # Overview -[WindowsPerf](https://gitlab.com/Linaro/WindowsPerf/windowsperf) is a (Linux [perf](https://perf.wiki.kernel.org) inspired) Windows on Arm performance profiling tool. Profiling is based on ARM64 PMU and its hardware counters. WindowsPerf supports the counting model for obtaining aggregate counts of occurrences of special events, and sampling model for determining the frequencies of event occurrences produced by program locations at the function, basic block, and/or instruction levels. +[WindowsPerf](https://github.com/arm-developer-tools/windowsperf) is a (Linux [perf](https://perf.wiki.kernel.org) inspired) Windows on Arm performance profiling tool. Profiling is based on ARM64 PMU and its hardware counters. WindowsPerf supports the counting model for obtaining aggregate counts of occurrences of special events, and sampling model for determining the frequencies of event occurrences produced by program locations at the function, basic block, and/or instruction levels. Learn more in this [blog](https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/announcing-windowsperf) announcing the first release. ## WindowsPerf architecture `WindowsPerf` is composed of two main components: -- [wperf](https://gitlab.com/Linaro/WindowsPerf/windowsperf/-/tree/main/wperf) a command line interface (CLI) sometimes referred as "user-space app" and -- [wperf-driver](https://gitlab.com/Linaro/WindowsPerf/windowsperf/-/tree/main/wperf-driver) a (signed) Kernel-Mode Driver Framework (KMDF) driver. +- [wperf](https://github.com/arm-developer-tools/windowsperf/tree/main/wperf) a command line interface (CLI) sometimes referred as "user-space app" and +- [wperf-driver](https://github.com/arm-developer-tools/windowsperf/tree/main/wperf-driver) a (signed) Kernel-Mode Driver Framework (KMDF) driver. ## WindowsPerf releases -You can find all binary releases of `WindowsPerf` [here](https://gitlab.com/Linaro/WindowsPerf/windowsperf/-/releases). +You can find all binary releases of `WindowsPerf` [here](https://github.com/arm-developer-tools/windowsperf/releases). # Installation @@ -106,6 +106,30 @@ wperf test You can output `wperf test` command in JSON format. Use `--json` command line option to enable JSON output. {{% /notice %}} +## Obtain plain text information about specified event, metric, or group of metrics. + +Command line option `man` prints on screen information about specified event, metric, or group of metrics. + +```command +wperf man l1d_cache_mpki +``` + +```output +CPU + neoverse-n1 +NAME + l1d_cache_mpki - L1D Cache MPKI +EVENTS + inst_retired, l1d_cache_refill +DESCRIPTION + This metric measures the number of level 1 data cache accesses missed per + thousand instructions executed. +FORMULA + l1d_cache_refill / inst_retired * 1000 +UNIT + MPKI +``` + ## Generate sample profile Specify the `event` to profile with `-e`. Groups of events, known as `metrics` can be specified with `-m`. @@ -139,4 +163,4 @@ You can output `wperf stat` command in JSON format. Use `--json` command line op {{% /notice %}} -Example use cases are provided in the WindowsPerf [documentation](https://gitlab.com/Linaro/WindowsPerf/windowsperf/-/blob/main/wperf/README.md#counting-model). +Example `wperf stat` command use cases are provided in the WindowsPerf [documentation](https://github.com/arm-developer-tools/windowsperf/tree/main/wperf#counting-model). diff --git a/content/learning-paths/laptops-and-desktops/windowsperf/windowsperf_cheatsheet.md b/content/learning-paths/laptops-and-desktops/windowsperf/windowsperf_cheatsheet.md new file mode 100644 index 0000000000..27196d1b75 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/windowsperf/windowsperf_cheatsheet.md @@ -0,0 +1,71 @@ +--- +layout: learningpathall +title: WindowsPerf cheat sheet +weight: 2 +--- + +# WindowsPerf cheat sheet + +The cheat sheet for the `wperf` command line tool focuses specifically on counting and sampling commands. It includes `wperf stat` for counting occurrences of specific PMU events and `wperf sample` and `wperf record` for sampling PMU event. Each command is explained with practical example. + +## WindowsPerf cheat sheet (PMU Counting Examples) + +- Count events `inst_spec`, `vfp_spec`, `ase_spec` and `ld_spec` on core #0 for 3 seconds: + +```command +wperf stat -e inst_spec,vfp_spec,ase_spec,ld_spec -c 0 --timeout 3 +``` + +- Count metric `imix` (metric events will be grouped) and additional event `l1i_cache` on core #7 for 10.5 seconds: + +```command +wperf stat -m imix -e l1i_cache -c 7 --timeout 10.5 +``` + +- Count in timeline mode (output counting to CSV file) metric `imix` 3 times on core #1 with 2 second intervals (delays between counts). Each count will last 5 seconds: + +```command +wperf stat -m imix -c 1 -t -i 2 -n 3 --timeout 5 +``` + +## WindowsPerf cheat sheet (PMU Sampling Examples) + +- Launch and pin `python_d.exe –c 10**10**100` to core no. 1 and sample given image name: + +```command +start /affinity 2 python_d.exe -c 10**10**100 +wperf sample -e ld_spec:100000 -c 1 --pe_file python_d.exe --image_name python_d.exe +``` + +Same workflow can be wrapped with `wperf record` command, see example below: + +- Launch `python_d.exe -c 10**10**100` process and start sampling event `ld_spec` with frequency `100000` on core no. 1 for 30 seconds. + +```command +wperf record -e ld_spec:100000 -c 1 --timeout 30 -- python_d.exe -c 10**10**100 +``` + +{{% notice Hint%}} +Add `--annotate` or `--disassemble` to `wperf record` command line parameters to increase sampling "resolution". +{{% /notice %}} + +## WindowsPerf cheat sheet (SPE Examples) + +Use Arm SPE optional extension to sample on core no. 1 process `python_d.exe`. SPE filter `load_filter` / `ld` enables collection of load sampled operations, including atomic operations that return a value to a register. + +Note: Double-dash operator `--` can be used with SPE as well to launch the process. + +```command +wperf record -e arm_spe_0/ld=1/ -c 1 -– python_d.exe -c 10**10**100 +``` + +Above command can be replaces by below two commands: + +```command +start /affinity 2 python_d.exe -c 10**10**100 +wperf sample -e arm_spe_0/ld=1/ -c 1 --pe_file python_d.exe --image_name python_d.exe +``` + +{{% notice Hint%}} +Add `--annotate` or `--disassemble` to `wperf record` command line parameters to increase sampling "resolution". +{{% /notice %}} diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/_index.md b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/_index.md new file mode 100644 index 0000000000..0f33fff372 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/_index.md @@ -0,0 +1,39 @@ +--- +title: Get started with the Windows Performance Analyzer (WPA) plugin for WindowsPerf + +draft: true +cascade: + draft: true + +minutes_to_complete: 15 + +who_is_this_for: This is an introductory topic for software developers interested in using the Windows Performance Analyzer (WPA) plugin for performance analysis. + +learning_objectives: + - Import WindowsPerf data as a .json file in WPA. + - Visualize the timeline and telemetry data in WPA using the WPA plugin. + +prerequisites: + - A Windows on Arm laptop with WindowsPerf, Windows Performance Analyzer (WPA), and the WPA plugin installed. + +author_primary: Alaaeddine Chakroun + +### Tags +skilllevels: Introductory +subjects: Performance and Architecture +armips: + - Cortex-A + - Neoverse +operatingsystems: + - Windows +tools_software_languages: + - WindowsPerf + - perf + - Windows Performance Analyzer + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/_next-steps.md b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/_next-steps.md new file mode 100644 index 0000000000..c62f6ba896 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/_next-steps.md @@ -0,0 +1,42 @@ +--- +next_step_guidance: Now that you have an idea of how the WindowsPerf WPA plugin works, You can try exploring the WindowsPerf CLI for more flexibility. + +recommended_path: "/learning-paths/laptops-and-desktops/windowsperf_sampling_cpython" + +further_reading: + - resource: + title: Announcing WindowsPerf Open-source performance analysis tool for Windows on Arm + link: https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/announcing-windowsperf + type: blog + - resource: + title: WindowsPerf Release 3.7.2 + link: https://www.linaro.org/blog/expanding-profiling-capabilities-with-windowsperf-372-release/ + type: blog + - resource: + title: WindowsPerf Visual Studio Extension v2.1.0 + link: https://www.linaro.org/blog/launching--windowsperf-visual-studio-extension-v210/ + type: blog + - resource: + title: Windows on Arm overview + link: https://learn.microsoft.com/en-us/windows/arm/overview + type: website + - resource: + title: Linaro Windows on Arm project + link: https://www.linaro.org/windows-on-arm/ + type: website + - resource: + title: WindowsPerf Visual Studio extension releases + link: https://github.com/arm-developer-tools/windowsperf-vs-extension/releases + type: website + - resource: + title: WindowsPerf releases + link: https://github.com/arm-developer-tools/windowsperf/releases + type: website + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +weight: 21 # set to always be larger than the content in this path, and one more than 'review' +title: "Next Steps" # Always the same +layout: "learningpathall" # All files under learning paths have this same wrapper +--- \ No newline at end of file diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/_review.md b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/_review.md new file mode 100644 index 0000000000..c200d1ea07 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/_review.md @@ -0,0 +1,54 @@ +--- +# ================================================================================ +# Edit +# ================================================================================ + +# Always 3 questions. Should try to test the reader's knowledge, and reinforce the key points you want them to remember. + # question: A one sentence question + # answers: The correct answers (from 2-4 answer options only). Should be surrounded by quotes. + # correct_answer: An integer indicating what answer is correct (index starts from 0) + # explanation: A short (1-3 sentence) explanation of why the correct answer is correct. Can add additional context if desired + + +review: + - questions: + question: > + The WPA plugin connects WindowsPerf to the Windows Performance Analyzer. + answers: + - "True" + - "False" + correct_answer: 1 + explanation: > + The Windows Performance Analyzer (WPA) plugin connects WindowsPerf to the Windows Performance Analyzer. + + - questions: + question: > + Which views can WPA display + answers: + - "Timeline" + - "Telemetry" + - "Function profile" + - "Timeline and telemetry" + - "All of the above" + correct_answer: 3 + explanation: > + WPA can display both the timeline and the telemetry views. + + - questions: + question: > + WindowsPerf can output data in JSON format with `--json` command line option. + answers: + - "True" + - "False" + correct_answer: 1 + explanation: > + Some `wperf` commands such as `list`, `test` or `stat` can output data in JSON format. + + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +title: "Review" # Always the same title +weight: 20 # Set to always be larger than the content in this path +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/telemetry-preview.png b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/telemetry-preview.png new file mode 100644 index 0000000000..c2b1c4edc5 Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/telemetry-preview.png differ diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/telemetry-table.png b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/telemetry-table.png new file mode 100644 index 0000000000..eb85d41270 Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/telemetry-table.png differ diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/timeline-by-core.png b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/timeline-by-core.png new file mode 100644 index 0000000000..b35f6df648 Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/timeline-by-core.png differ diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/timeline-by-event.png b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/timeline-by-event.png new file mode 100644 index 0000000000..a3c4b0b3f0 Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/timeline-by-event.png differ diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/timeline-events-by-key.png b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/timeline-events-by-key.png new file mode 100644 index 0000000000..2e28ba0b01 Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/timeline-events-by-key.png differ diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/wpa-first-screen.png b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/wpa-first-screen.png new file mode 100644 index 0000000000..8a990d14f2 Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/wpa-first-screen.png differ diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/wpa-open-file.png b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/wpa-open-file.png new file mode 100644 index 0000000000..8d5e01f417 Binary files /dev/null and b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/figures/wpa-open-file.png differ diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/windowsperf_wpa_plugin.md b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/windowsperf_wpa_plugin.md new file mode 100644 index 0000000000..2f1f1a04e7 --- /dev/null +++ b/content/learning-paths/laptops-and-desktops/windowsperf_wpa_plugin/windowsperf_wpa_plugin.md @@ -0,0 +1,104 @@ +--- +layout: learningpathall +title: Visualize data from WindowsPerf using WPA +weight: 2 +--- + +## Before you begin + +The Windows Performance Analyzer (WPA) plugin connects WindowsPerf to the Windows Performance Analyzer. + +Before trying WPA with the WPA plugin, you need to install the following software on your Windows on Arm computer: + +- [WindowsPerf](/install-guides/wperf/) +- [WPA plugin](/install-guides/windows-perf-wpa-plugin/) + +The WPA plugin install guide includes the installation of WPA. + +## Using WindowsPerf data in WPA + +In order to use WPA with WindowsPerf, you need to create a `.json` file which is created from a WindowsPerf `wperf stat` command running on a Windows on Arm machine. + +You can save a `.json` output from WindowsPerf by using the `--output` command followed by the filename. + +1. Create a `.json` file + + To create a file named `example.json`, run the following command: + + ```console + wperf stat -e ld_spec --output example.json + ``` + +2. Open Windows Performance Analyzer, and see the following window: + + ![wpa-first-screen #center](figures/wpa-first-screen.png) + + Confirm the `WindowsPerf WPA Plugin` appears under the Installed Plugins section. + +3. Open the `.json` file + + Click "Open file..." from the start menu on the left side and select the `example.json` file. + + ![wpa-open-file #center](figures/wpa-open-file.png) + + When you click `Open`, the output file is checked for compatibility with the WPA plugin, and the main WPA window opens up. + +### Timeline + +The WindowsPerf timeline feature (command line option -t) enables continuous counting of Performance Monitoring Unit (PMU) events. + +You can specify sleep intervals (with -i) between counts and set the number of repetitions (with -n), allowing for detailed and flexible data collection. + +You can use WPA to visualize PMU events in the recorded data. + +To try the timeline feature, run the command: + +```command +wperf stat -m dcache -c 0,1,2,3,4,5,6,7 -t -i 0 -n 50 --json +``` + +Open the generated output (`.json` file) in WPA to see the graph: + +![timeline-by-core #center](figures/timeline-by-core.png) + +You can change the default grouping from `Group by core` to `Group by event` and see the following graph: + +![timeline-by-event #center](figures/timeline-by-event.png) + +The WPA plugin also generates a graph per event note in order to provide a more in-depth grouping of events. + +To see all the generated graphs you can expand the `Counting timeline` section in the graph explorer section of WPA. + +Run another `wperf` command with different options: + +```console +wperf stat -t -i 0 -m imix,l1d_cache_miss_ratio,l1d_cache_mpki,l1d_tlb_miss_ratio,l1d_tlb_mpki -e inst_spec,vfp_spec,ld_spec,st_spec -c 1 --json +``` + +The graph after opening the `.json` file is shown below: + +![timeline-events-by-key #center](figures/timeline-events-by-key.png) + +You can double click on any graph to expand it under the Analysis tab for further data visualization. + +### Telemetry + +The WPA Plugin also provides visualization of [Arm telemetry metrics](https://developer.arm.com/documentation/109542/0100/About-Arm-CPU-Telemetry-Solution). + +To visualize telemetry, run the following command: + +```console +wperf stat -t -i 0 -m imix,l1d_cache_miss_ratio,l1d_cache_mpki,l1d_tlb_miss_ratio,l1d_tlb_mpki -e inst_spec,vfp_spec,ld_spec,st_spec -c 1 --json +``` + +You can also see the telemetry timeline graphs under the graph explorer level in WPA. + +These graphs are generated dynamically so only the relevant metrics for the given `.json` output file are visible. + +![telemetry-preview #center](figures/telemetry-preview.png) + +Once expanded, a more in-depth view is visible under the Analysis tab of WPA. + +![telemetry-table #center](figures/telemetry-table.png) + +You now have a basic understanding of how to use `wperf` generated data in the Windows Performance Analyzer. \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/_index.md b/content/learning-paths/servers-and-cloud-computing/_index.md index 081706830e..867377bee6 100644 --- a/content/learning-paths/servers-and-cloud-computing/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/_index.md @@ -9,7 +9,7 @@ maintopic: true operatingsystems_filter: - Android: 2 - Baremetal: 1 -- Linux: 107 +- Linux: 109 - macOS: 9 - Windows: 12 pinned_modules: @@ -23,8 +23,8 @@ subjects_filter: - Containers and Virtualization: 25 - Databases: 15 - Libraries: 6 -- ML: 13 -- Performance and Architecture: 37 +- ML: 14 +- Performance and Architecture: 38 - Storage: 1 - Web: 10 subtitle: Optimize cloud native apps on Arm for performance and cost @@ -57,22 +57,22 @@ tools_software_languages_filter: - Capstone: 1 - CCA: 3 - Clair: 1 -- Clang: 9 +- Clang: 10 - ClickBench: 1 - ClickHouse: 1 - CloudFormation: 1 - CMake: 1 -- Coding: 19 +- Coding: 20 - Django: 1 - Docker: 15 - Envoy: 2 - Flink: 1 - Fortran: 1 - FVP: 3 -- GCC: 17 +- GCC: 18 - gdb: 1 - Geekbench: 1 -- GenAI: 4 +- GenAI: 5 - GitHub: 3 - GitLab: 1 - Glibc: 1 @@ -91,7 +91,7 @@ tools_software_languages_filter: - Lambda: 1 - libbpf: 1 - Linaro Forge: 1 -- LLM: 2 +- LLM: 3 - llvm-mca: 1 - LSE: 1 - MariaDB: 1 @@ -107,13 +107,13 @@ tools_software_languages_filter: - PAPI: 1 - perf: 3 - PostgreSQL: 4 -- Python: 11 +- Python: 12 - PyTorch: 5 - RAG: 1 - Redis: 3 - Remote.It: 2 - RME: 3 -- Rust: 1 +- Rust: 2 - snappy: 1 - Snort: 1 - SQL: 7 diff --git a/content/learning-paths/servers-and-cloud-computing/bolt/before-you-begin.md b/content/learning-paths/servers-and-cloud-computing/bolt/before-you-begin.md index 1fd1bd0ede..e4de1840b9 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt/before-you-begin.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt/before-you-begin.md @@ -139,6 +139,7 @@ arm_spe_0// [Kernel PMU event] ``` If `arm_spe` isn't found you will need to update the Linux Kernel and perf to 5.15 or later. +To enable it see [Enable the SPE feature in Linux guide](https://developer.arm.com/documentation/ka005362/1-0). To confirm SPE is working run: @@ -175,4 +176,4 @@ For Clang: ```bash clang -Wl,--emit-relocs -``` \ No newline at end of file +``` diff --git a/content/learning-paths/servers-and-cloud-computing/bolt/bolt-spe.md b/content/learning-paths/servers-and-cloud-computing/bolt/bolt-spe.md index 335d740043..8156e2d7aa 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt/bolt-spe.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt/bolt-spe.md @@ -1,5 +1,5 @@ --- -title: Using BOLT with SPE +title: Use BOLT with SPE weight: 6 ### FIXED, DO NOT MODIFY @@ -8,7 +8,12 @@ layout: learningpathall ## BOLT with SPE -The steps to optimize an executable with BOLT using Perf SPE is below. +{{% notice Important Note %}} +Currently, BOLT may not generate a faster binary when using Perf SPE due to limitations within `perf` and BOLT itself. +For more information and the latest updates see: [[AArch64] BOLT does not support SPE branch data](https://github.com/llvm/llvm-project/issues/115333). +{{% /notice %}} + +The steps to use BOLT with Perf SPE are listed below. ### Collect Perf data with SPE diff --git a/content/learning-paths/servers-and-cloud-computing/csp/google.md b/content/learning-paths/servers-and-cloud-computing/csp/google.md index c4430cb9de..34414f8516 100644 --- a/content/learning-paths/servers-and-cloud-computing/csp/google.md +++ b/content/learning-paths/servers-and-cloud-computing/csp/google.md @@ -11,7 +11,7 @@ layout: "learningpathall" As with most cloud service providers, Google Cloud offers a pay-as-you-use [pricing policy](https://cloud.google.com/pricing), including a number of [free](https://cloud.google.com/free/docs/free-cloud-features) services. -This section is to help you get started with [Google Cloud Compute Engine](https://cloud.google.com/compute) compute services, using Arm-based Virtual Machines. Google Cloud offers two generations of Arm-based VMs, `C4A` is the latest generation based on [Google Axion](cloud.google.com/products/axion), Google’s first Arm-based server processor, built using the Armv9 Neoverse V2 CPU. The previous generation VMs are based on Ampere Altra processor and part of [Tau T2A](https://cloud.google.com/tau-vm) family of Virtual Machines. +This section is to help you get started with [Google Cloud Compute Engine](https://cloud.google.com/compute) compute services, using Arm-based Virtual Machines. Google Cloud offers two generations of Arm-based VMs, `C4A` is the latest generation based on [Google Axion](https://cloud.google.com/products/axion), Google’s first Arm-based server processor, built using the Armv9 Neoverse V2 CPU. The previous generation VMs are based on Ampere Altra processor and part of [Tau T2A](https://cloud.google.com/tau-vm) family of Virtual Machines. Detailed instructions are available in the Google Cloud [documentation](https://cloud.google.com/compute/docs/instances). @@ -23,7 +23,7 @@ If using an organization's account, you will likely need to consult with your in ## Browse for an appropriate instance -Google Cloud offers a wide range of instance types, covering all performance (and pricing) points. For an overview of the `C4A` instance types, see this [page](cloud.google.com/products/axion). Similarly, to know more about the `T2A` instance types, see the [General-purpose machine family](https://cloud.google.com/compute/docs/general-purpose-machines#t2a_machines) overview. +Google Cloud offers a wide range of instance types, covering all performance (and pricing) points. For an overview of the `C4A` instance types, see the [General-purpose machine family](https://cloud.google.com/compute/docs/general-purpose-machines#c4a_series). Similarly, to know more about the `T2A` instance types, see this [page](https://cloud.google.com/compute/docs/general-purpose-machines#t2a_machines). Also note which [regions](https://cloud.google.com/compute/docs/regions-zones#available) these servers are available in. @@ -49,15 +49,15 @@ Select an appropriate `region` and `zone` that support Arm-based servers. ![google3 #center](https://github.com/ArmDeveloperEcosystem/arm-learning-paths/assets/71631645/f2a19cd0-7565-44d3-9e6f-b27bccad3e86 "Select an appropriate region and zone") -To view the latest information on which available regions and zones support Arm-based servers, see the [Compute Engine documentation](https://cloud.google.com/compute/docs/regions-zones#available). To filter for Arm-based machines, click on `Select a machine type`, then select `T2A` or `C4A` from the pull-down menu. +To view the latest information on which available regions and zones support Arm-based servers, see the [Compute Engine documentation](https://cloud.google.com/compute/docs/regions-zones#available). To filter for Arm-based machines, click on `Select a machine type`, then select `C4A` or `T2A` from the pull-down menu. -![google4 #center](https://github.com/ArmDeveloperEcosystem/arm-learning-paths/assets/71631645/5b1683dc-724f-4c60-aea6-dc945c7bf6bc "Check which regions and zones support Arm-based machines") +![google4 #center](images/axion-series.png "Check which regions and zones support Arm-based machines") ### Machine configuration Select `C4A` from the `Series` pull-down menu. Then select an appropriate `Machine type` configuration for your needs. -![google5 #center](images/gcp_instance_new.png "Select an appropriate C4A machine type") +![google5 #center](images/axion-instance.png "Select an appropriate C4A machine type") ### Boot disk configuration diff --git a/content/learning-paths/servers-and-cloud-computing/csp/images/axion-instance.png b/content/learning-paths/servers-and-cloud-computing/csp/images/axion-instance.png new file mode 100644 index 0000000000..99efa10bc8 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/csp/images/axion-instance.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/csp/images/axion-series.png b/content/learning-paths/servers-and-cloud-computing/csp/images/axion-series.png new file mode 100644 index 0000000000..64b62f6e88 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/csp/images/axion-series.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/csp/images/gcp_instance_new.png b/content/learning-paths/servers-and-cloud-computing/csp/images/gcp_instance_new.png deleted file mode 100644 index a7edaf7844..0000000000 Binary files a/content/learning-paths/servers-and-cloud-computing/csp/images/gcp_instance_new.png and /dev/null differ diff --git a/content/learning-paths/servers-and-cloud-computing/gke-multi-arch/how-to-1.md b/content/learning-paths/servers-and-cloud-computing/gke-multi-arch/how-to-1.md index 63016134df..7dc522e0c6 100644 --- a/content/learning-paths/servers-and-cloud-computing/gke-multi-arch/how-to-1.md +++ b/content/learning-paths/servers-and-cloud-computing/gke-multi-arch/how-to-1.md @@ -8,7 +8,7 @@ layout: learningpathall ## Migrate an existing x86-based application to run on Arm-based nodes in a single GKE cluster -Google Kubernetes Engine (GKE) supports hybrid clusters with x86 and Arm based nodes. The Arm-based nodes can be deployed on the `C4A` family of virtual machines. The `C4A` VMs are based on [Google Axion](cloud.google.com/products/axion), Google’s first Arm-based server processor, built using the Armv9 Neoverse V2 CPU. +Google Kubernetes Engine (GKE) supports hybrid clusters with x86 and Arm based nodes. The Arm-based nodes can be deployed on the `C4A` family of virtual machines. The `C4A` VMs are based on [Google Axion](http://cloud.google.com/products/axion/), Google’s first Arm-based server processor, built using the Armv9 Neoverse V2 CPU. ## Before you begin diff --git a/content/learning-paths/servers-and-cloud-computing/java-on-axion/1-create-instance.md b/content/learning-paths/servers-and-cloud-computing/java-on-axion/1-create-instance.md index 0e30ae6993..a2dfabd24a 100644 --- a/content/learning-paths/servers-and-cloud-computing/java-on-axion/1-create-instance.md +++ b/content/learning-paths/servers-and-cloud-computing/java-on-axion/1-create-instance.md @@ -8,7 +8,7 @@ layout: learningpathall ## Create an Axion instance -Axion is Google’s first Arm-based server processor, built using the Armv9 Neoverse V2 CPU. Created specifically for the data center, Axion delivers industry-leading performance and energy efficiency. To learn more about Google Axion, refer to this [page](cloud.google.com/products/axion) +Axion is Google’s first Arm-based server processor, built using the Armv9 Neoverse V2 CPU. Created specifically for the data center, Axion delivers industry-leading performance and energy efficiency. To learn more about Google Axion, refer to this [page](http://cloud.google.com/products/axion/) There are several ways to create an Arm-based Google Axion VM: the Google Cloud console, the gcloud CLI tool, or using your choice of IaC (Infrastructure as Code). diff --git a/content/learning-paths/servers-and-cloud-computing/kubearchinspect/_review.md b/content/learning-paths/servers-and-cloud-computing/kubearchinspect/_review.md index fdc1d2d118..bf5c982402 100644 --- a/content/learning-paths/servers-and-cloud-computing/kubearchinspect/_review.md +++ b/content/learning-paths/servers-and-cloud-computing/kubearchinspect/_review.md @@ -13,10 +13,10 @@ review: - questions: question: > - True or False: KubeArchInspect automatically upgrades images to the latest version. + Does KubeArchInspect automatically upgrade images to the latest version? answers: - - "True" - - "False" + - Yes. + - No. correct_answer: 2 explanation: > KubeArchInspect does not automatically upgrade images to the latest version. It only identifies the images that are available. @@ -40,4 +40,4 @@ review: title: "Review" # Always the same title weight: 20 # Set to always be larger than the content in this path layout: "learningpathall" # All files under learning paths have this same wrapper ---- \ No newline at end of file +--- diff --git a/content/learning-paths/servers-and-cloud-computing/rtp-llm/_index.md b/content/learning-paths/servers-and-cloud-computing/rtp-llm/_index.md new file mode 100644 index 0000000000..345fe0efb5 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/rtp-llm/_index.md @@ -0,0 +1,37 @@ +--- +title: Run an LLM chatbot with rtp-llm on Arm-based servers + +minutes_to_complete: 30 + +who_is_this_for: This is an introductory topic for developers who are interested in running a Large Language Model (LLM) with rtp-llm on Arm-based servers. + +learning_objectives: + - Build rtp-llm on an Arm-based server. + - Download a Qwen model from Hugging Face. + - Run a Large Language Model with rtp-llm. + +prerequisites: + - Any Arm Neoverse N2-based or Arm Neoverse V2-based instance running Ubuntu 22.04 LTS from a cloud service provider or an on-premise Arm server. + - For the server, at least four cores and 16GB of RAM, with disk storage configured up to at least 32 GB. + +author_primary: Tianyu Li + +### Tags +skilllevels: Introductory +subjects: ML +armips: + - Neoverse +operatingsystems: + - Linux +tools_software_languages: + - LLM + - GenAI + - Python + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/rtp-llm/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/rtp-llm/_next-steps.md new file mode 100644 index 0000000000..f12a3261b3 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/rtp-llm/_next-steps.md @@ -0,0 +1,38 @@ +--- +next_step_guidance: > + Thank you for completing this Learning path on how to run a LLM chatbot on an Arm-based server. You might be interested in learning how to run a NLP sentiment analysis model on an Arm-based server. + +recommended_path: "/learning-paths/servers-and-cloud-computing/nlp-hugging-face/" + + +further_reading: + - resource: + title: Qwen2-0.5B-Instruct + link: https://huggingface.co/Qwen/Qwen2-0.5B-Instruct + type: website + - resource: + title: Getting started with RTP-LLM + link: https://github.com/alibaba/rtp-llm + type: documentation + - resource: + title: Hugging Face Documentation + link: https://huggingface.co/docs + type: documentation + - resource: + title: Democratizing Generative AI with CPU-based inference + link: https://blogs.oracle.com/ai-and-datascience/post/democratizing-generative-ai-with-cpu-based-inference + type: blog + - resource: + title: Get started with Arm-based cloud instances + link: https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/ + type: website + + + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +weight: 21 # set to always be larger than the content in this path, and one more than 'review' +title: "Next Steps" # Always the same +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/servers-and-cloud-computing/rtp-llm/_review.md b/content/learning-paths/servers-and-cloud-computing/rtp-llm/_review.md new file mode 100644 index 0000000000..cd52e9b0cc --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/rtp-llm/_review.md @@ -0,0 +1,39 @@ +--- +review: + - questions: + question: > + Are at least four cores, 16GB of RAM, and 32GB of disk storage required to run the LLM chatbot using rtp-llm on an Arm-based server? + answers: + - "Yes" + - "No" + correct_answer: 1 + explanation: > + It depends on the size of the LLM. The higher the number of parameters of the model, the greater the system requirements. + + - questions: + question: > + Does the rtp-llm project use the --config=arm option to optimize LLM inference for Arm CPUs? + answers: + - "Yes" + - "No" + correct_answer: 1 + explanation: > + rtp-llm uses the GPU for inference by default. rtp-llm optimizes LLM inference on Arm architecture by providing a configuration option --config=arm during the build process. + + - questions: + question: > + Is the given Python script the only way to run the LLM chatbot on an Arm AArch64 CPU and output a response from the model? + answers: + - "Yes" + - "No" + correct_answer: 2 + explanation: > + rtp-llm can also be deployed as an API server, and the user can use curl or another client to generate an LLM chatbot response. + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +title: "Review" # Always the same title +weight: 20 # Set to always be larger than the content in this path +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/servers-and-cloud-computing/rtp-llm/overview.md b/content/learning-paths/servers-and-cloud-computing/rtp-llm/overview.md new file mode 100644 index 0000000000..2044c7c36f --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/rtp-llm/overview.md @@ -0,0 +1,33 @@ +--- +title: Background +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- +Arm CPUs are widely used in ML and AI use cases. In this Learning Path, you will learn how to run the generative AI inference-based use case of an LLM chatbot on an Arm-based CPU. You will do this by deploying the [Qwen2-0.5B-Instruct model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on an Arm-based CPU using `rtp-llm`. + + +{{% notice Note %}} +This Learning Path has been tested on an Alibaba Cloud g8y.8xlarge instance and an AWS Graviton4 r8g.8xlarge instance. +{{% /notice %}} + + +[rtp-llm](https://github.com/alibaba/rtp-llm) is an open-source C/C++ project developed by Alibaba that enables efficient LLM inference on a variety of hardware. + +RTP-LLM is a Large Language Model inference acceleration engine developed by Alibaba. Qwen is the name given to a series of Large Language Models developed by Alibaba Cloud that are capable of performing a variety of tasks. + +Alibaba Cloud offer a wide range of models, each suitable for different tasks and use cases. + +Besides generating text, they are also able to perform actions such as: + +* Answering questions, through information retrieval, and analysis. +* Processing images, and producing written descriptions of visual content. +* Processing audio content. +* Provide multilingual support, with over 27 additional languages, on top of the core languages of English and Chinese. + +Qwen is open source, flexible, and encourages contribution from the software development community. + + + + diff --git a/content/learning-paths/servers-and-cloud-computing/rtp-llm/rtp-llm-chatbot.md b/content/learning-paths/servers-and-cloud-computing/rtp-llm/rtp-llm-chatbot.md new file mode 100644 index 0000000000..69cf87759a --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/rtp-llm/rtp-llm-chatbot.md @@ -0,0 +1,172 @@ +--- +title: Run an LLM chatbot with rtp-llm on an Arm server +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- +## Install dependencies + +Install `micromamba` to set up python 3.10 at path `/opt/conda310`, as required by the `rtp-llm` build system: + +```bash +"${SHELL}" <(curl -L micro.mamba.pm/install.sh) +source ~/.bashrc +sudo ${HOME}/.local/bin/micromamba -r /opt/conda310 install python=3.10 +micromamba -r /opt/conda310 shell +``` + +Install `bazelisk` to build `rtp-llm`: + +```bash +wget https://github.com/bazelbuild/bazelisk/releases/download/v1.22.1/bazelisk-linux-arm64 +chmod +x bazelisk-linux-arm64 +sudo mv bazelisk-linux-arm64 /usr/bin/bazelisk +``` + +Install `git/gcc/g++`: + +```bash +sudo apt install git -y +sudo apt install build-essential -y +``` + +Install the `openblas` development package and fix the header paths: + +```bash +sudo apt install libopenblas-dev +sudo mkdir -p /usr/include/openblas +sudo ln -sf /usr/include/aarch64-linux-gnu/cblas.h /usr/include/openblas/cblas.h +``` + +## Download and build rtp-llm + +You are now ready to start building `rtp-llm`. + +Start by cloning the source repository for rtp-llm: + +```bash +git clone https://github.com/alibaba/rtp-llm +cd rtp-llm +git checkout 4656265 +``` + +Next, comment out lines 7-10 in `deps/requirements_lock_torch_arm.txt` as some hosts are not accessible from the web: + +```bash +sed -i '7,10 s/^/#/' deps/requirements_lock_torch_arm.txt +``` + +By default, `rtp-llm` builds for GPU only on Linux. You need to provide the additional flag `--config=arm` to build it for the Arm CPU that you will run it on. + +Configure and build: + +```bash +bazelisk build --config=arm //maga_transformer:maga_transformer_aarch64 +``` +The output from your build should look like this: + +```output +INFO: 10094 processes: 8717 internal, 1377 local. +INFO: Build completed successfully, 10094 total actions +``` + +Install the built wheel package: + +```bash +pip install bazel-bin/maga_transformer/maga_transformer-0.2.0-cp310-cp310-linux_aarch64.whl +``` + +Create a file named `python-test.py` in your `/tmp` directory with the contents shown below: + +```python +from maga_transformer.pipeline import Pipeline +from maga_transformer.model_factory import ModelFactory +from maga_transformer.openai.openai_endpoint import OpenaiEndopoint +from maga_transformer.openai.api_datatype import ChatCompletionRequest, ChatMessage, RoleEnum +from maga_transformer.distribute.worker_info import update_master_info + +import asyncio +import json +import os + +async def main(): + update_master_info('127.0.0.1', 42345) + os.environ["MODEL_TYPE"] = os.environ.get("MODEL_TYPE", "qwen2") + os.environ["CHECKPOINT_PATH"] = os.environ.get("CHECKPOINT_PATH", "Qwen/Qwen2-0.5B-Instruct") + os.environ["RESERVER_RUNTIME_MEM_MB"] = "0" + os.environ["DEVICE_RESERVE_MEMORY_BYTES"] = f"{128 * 1024 ** 2}" + model_config = ModelFactory.create_normal_model_config() + model = ModelFactory.from_huggingface(model_config.ckpt_path, model_config=model_config) + pipeline = Pipeline(model, model.tokenizer) + + # usual request + for res in pipeline("<|im_start|>user\nhello, what's your name<|im_end|>\n<|im_start|>assistant\n", max_new_tokens = 100): + print(res.generate_texts) + + # openai request + openai_endpoint = OpenaiEndopoint(model) + messages = [ + ChatMessage(**{ + "role": RoleEnum.user, + "content": "Who are you?", + }), + ] + request = ChatCompletionRequest(messages=messages, stream=False) + response = openai_endpoint.chat_completion(request_id=0, chat_request=request, raw_request=None) + async for res in response: + pass + print((await response.gen_complete_response_once()).model_dump_json(indent=4)) + + pipeline.stop() + +if __name__ == '__main__': + asyncio.run(main()) +``` + +Now run this file: + +```bash +python /tmp/python-test.py +``` + +If `rtp-llm` has built correctly on your machine, you will see the LLM model response for the prompt input. + +A snippet of the output is shown below: + +```output +['I am a large language model created by Alibaba Cloud. My name is Qwen.'] +{ + "id": "chat-", + "object": "chat.completion", + "created": 1730272196, + "model": "AsyncModel", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "I am a large language model created by Alibaba Cloud. I am called Qwen.", + "function_call": null, + "tool_calls": null + }, + "finish_reason": "stop" + } + ], + "usage": { + "prompt_tokens": 23, + "total_tokens": 40, + "completion_tokens": 17, + "completion_tokens_details": null, + "prompt_tokens_details": null + }, + "debug_info": null, + "aux_info": null +} +``` + + +You have successfully run a LLM chatbot with Arm optimizations, running on an Arm AArch64 CPU on your server. + +You can continue to experiment with the chatbot by trying out different prompts on the model. + diff --git a/content/learning-paths/servers-and-cloud-computing/rtp-llm/rtp-llm-server.md b/content/learning-paths/servers-and-cloud-computing/rtp-llm/rtp-llm-server.md new file mode 100644 index 0000000000..4aec6c145a --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/rtp-llm/rtp-llm-server.md @@ -0,0 +1,202 @@ +--- +title: Access the chatbot with rtp-llm using the OpenAI-compatible API +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- +## Setup + +You can now move on to using the `rtp-llm` server program and submitting requests using an OpenAI-compatible API. + +This enables applications to be created which access the LLM multiple times without starting and stopping it. + +You can also access the server over the network to another machine hosting the LLM. + +One additional software package is required for this section. + +Install `jq` on your computer using the following commands: + +```bash +sudo apt install jq -y +``` + +## Running the Server + +There are a few different ways you can download the Qwen2 0.5B model. In this Learning Path, you will download the model from Hugging Face. + +[Hugging Face](https://huggingface.co/) is an open source AI community where you can host your own AI models, train them, and collaborate with others in the community. You can browse through thousands of models that are available for a variety of use cases such as Natural Language Processing (NLP), audio, and computer vision. + +The `huggingface_hub` library provides APIs and tools that let you easily download and fine-tune pre-trained models. You will use `huggingface-cli` to download the [Qwen2 0.5B model](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct). + +## Install Hugging Face Hub + +Install the required Python packages: + +```bash +sudo apt install python-is-python3 python3-pip python3-venv -y +``` + +Create and activate a Python virtual environment: + +```bash +python -m venv venv +source venv/bin/activate +``` + +Your terminal prompt now has the `(venv)` prefix indicating the virtual environment is active. Use this virtual environment for the remaining commands. + +Install the `huggingface_hub` python library using `pip`: + +```bash +pip install huggingface_hub +``` + +You can now download the model using the huggingface cli: + +```bash +huggingface-cli download Qwen/Qwen2-0.5B-Instruct +``` + +## Start the rtp-llm server + +{{% notice Note %}} +The server executable compiled during the previous stage, when you ran `bazelisk build`. {{% /notice %}} + +Install the pip wheel in your active virtual environment: + +```bash +pip install bazel-bin/maga_transformer/maga_transformer-0.2.0-cp310-cp310-linux_aarch64.whl +pip install grpcio-tools +``` +Start the server from the command line. It listens on port 8088: + +```bash +export CHECKPOINT_PATH=${HOME}/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct/snapshots/c540970f9e29518b1d8f06ab8b24cba66ad77b6d/ +export TOKENIZER_PATH=$CHECKPOINT_PATH +export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python +MODEL_TYPE=qwen_2 FT_SERVER_TEST=1 python3 -m maga_transformer.start_server +``` + +## Client + +### Using curl + +You can access the API using the `curl` command. + +In another terminal, use a text editor to create a file named `curl-test.sh` with the content below: + +```bash +curl http://localhost:8088/v1/chat/completions -H "Content-Type: application/json" -d '{ + "model": "any-model", + "messages": [ + { + "role": "system", + "content": "You are a coding assistant, skilled in programming." + }, + { + "role": "user", + "content": "Write a hello world program in C++." + } + ] + }' 2>/dev/null | jq -C +``` + +The `model` value in the API is not used, and you can enter any value. This is because there is only one model loaded in the server. + +Run the script: + +```bash +bash ./curl-test.sh +``` + +The `curl` command accesses the LLM and you should see the output: + +```output +{ + "id": "chat-", + "object": "chat.completion", + "created": 1730277073, + "model": "AsyncModel", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "Sure, here's a simple C++ program that prints \"Hello, World!\" to the console:\n\n```cpp\n#include \n\nint main() {\n std::cout << \"Hello, World!\" << std::endl;\n return 0;\n}\n```\n\nThis program includes the `iostream` library, which is used for input/output operations. The `main` function is the entry point of the program, and it calls the `cout` object to print the message \"Hello, World!\" to the console." + }, + "finish_reason": "stop" + } + ], + "usage": { + "prompt_tokens": 32, + "total_tokens": 137, + "completion_tokens": 105 + } +} +``` + +In the returned JSON data, you will see the LLM output, including the content created from the prompt. + +### Using Python + +You can also use a Python program to access the OpenAI-compatible API. + +Create a Python `venv`: + +```bash +python -m venv pytest +source pytest/bin/activate +``` + +Install the OpenAI Python package: +```bash +pip install openai==1.45.0 +``` + +Use a text editor to create a file named `python-test.py` with the content below: + +```python +from openai import OpenAI + +client = OpenAI( + base_url='http://localhost:8088/v1', + api_key='no-key' + ) + +completion = client.chat.completions.create( + model="not-used", + messages=[ + {"role": "system", "content": "You are a coding assistant, skilled in programming.."}, + {"role": "user", "content": "Write a hello world program in C++."} + ], + stream=True, +) + +for chunk in completion: + print(chunk.choices[0].delta.content or "", end="") +``` + +Ensure that the server is still running, and then run the Python file: + +```bash +python ./python-test.py +``` + +You should see the output generated by the LLM: + +```output +Sure, here's a simple C++ program that prints "Hello, World!" to the console: + +```cpp +#include + +int main() { + std::cout << "Hello, World!" << std::endl; + return 0; +} + +This program includes the `iostream` library, which is used for input/output operations. The `main` function is the entry point of the program, and it calls the `cout` object to print the message "Hello, World!" to the console. +``` + +Now you can continue to experiment with different large language models, and have a go at writing scripts to access them. diff --git a/content/learning-paths/smartphones-and-mobile/_index.md b/content/learning-paths/smartphones-and-mobile/_index.md index 7432cb5a98..0461503058 100644 --- a/content/learning-paths/smartphones-and-mobile/_index.md +++ b/content/learning-paths/smartphones-and-mobile/_index.md @@ -10,21 +10,21 @@ key_ip: - Mali maintopic: true operatingsystems_filter: -- Android: 21 -- Linux: 19 -- macOS: 9 -- Windows: 8 +- Android: 23 +- Linux: 21 +- macOS: 10 +- Windows: 10 subjects_filter: - Gaming: 6 - Graphics: 3 -- ML: 7 -- Performance and Architecture: 23 +- ML: 8 +- Performance and Architecture: 24 subtitle: Optimize Android apps and build faster games using cutting-edge Arm tech title: Smartphones and Mobile tools_software_languages_filter: - 7-Zip: 1 - adb: 1 -- Android: 2 +- Android: 3 - Android NDK: 1 - Android SDK: 1 - Android Studio: 7 @@ -34,30 +34,31 @@ tools_software_languages_filter: - assembly: 1 - Bazel: 1 - C#: 3 -- C++: 3 +- C++: 5 - C/C++: 1 - CCA: 1 -- Clang: 8 +- Clang: 9 - CMake: 1 -- Coding: 17 +- Coding: 18 - Fixed Virtual Platform: 1 - Frame Advisor: 1 -- GCC: 9 +- GCC: 10 - GenAI: 1 - GoogleTest: 1 -- Java: 3 -- Kotlin: 2 +- Java: 4 +- Kotlin: 4 - llvm-mca: 1 - MediaPipe: 1 - Memory Bug Report: 1 - Memory Tagging Extension: 1 -- Mobile: 4 +- Mobile: 6 - NDK: 1 - NEON: 1 -- Python: 1 +- ONNX Runtime: 1 +- Python: 2 - QEMU: 1 - RME: 1 -- Rust: 1 +- Rust: 2 - SDDiskTool: 1 - SVE2: 1 - Total Compute: 1 diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/1-webgpu-fundamentals.md b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/1-webgpu-fundamentals.md new file mode 100644 index 0000000000..0ccb2f79c4 --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/1-webgpu-fundamentals.md @@ -0,0 +1,68 @@ +--- +title: Introduction to WebGPU +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## What is WebGPU? + +WebGPU is the successor to WebGL, a well adopted modern API standard for interfacing with GPUs. WebGPU provides better compatibility with modern GPUs, support for general-purpose GPU computations, faster operations, and access to more advanced GPU features. It is designed to provide a _unified access_ to GPUs, agnostic to GPU vendors and operating systems. + +WebGPU is a Render Hardware Interface built on top of various backend APIs like Vulkan, DirectX, and Metal (depending on the operating system). + +WebGPU is available through web browsers using the webgpu.h header file. + +The high level view of WebGPU is shown below: + +![WebGPU high level view #center](images/webgpu_highlevel.png "WebGPU High Level View") + +## What are the benefits of WebGPU? + +WebGPU takes into account learnings from older standards like WebGL and OpenGL and provides the following benefits: + +* A reasonable level of abstraction +* Good performance +* Cross-platform +* Backed by W3C standards group +* Future-proof design + +WebGPU is a standard and not a true API, so the implementation can be adopted and developed as an interface between native applications developed in any programming language. + +The performance requirements for web pages is actually the same as for native application. + +{{% notice Note %}} +When designing an API for the Web, the two key constraints are portability and privacy. + +The limitations of the API due to privacy considerations can be disabled when using WebGPU as a native API. +{{% /notice %}} + +## What are the benefits of using C++ for WebGPU? + +The initial target for WebGPU was JavaScript. The initial `webgpu.h` header file is written in C. + +This Learning Path uses C++ rather than JavaScript or C because for the following reasons: + +* C++ is still the primary language used for high performance graphics applications, such as video games, render engines, and modeling tools. +* The level of abstraction and control of C++ is well suited for interacting with graphics APIs in general. +* Graphics programming is a good way to learn more C++. + +## Dawn: the Google WebGPU implementation + +Since WebGPU is a standard and not an implementation, there are different implementations. + +[Dawn](https://github.com/google/dawn) is an open-source, cross-platform implementation of the WebGPU standard. + +It implements the WebGPU functionality specified in `webgpu.h`. Dawn is meant to be integrated as part of a larger system like Chromium or a native Android Application. + +Dawn provides several WebGPU building blocks: + +* WebGPU C/C++ headers that applications and other building blocks use, including a header file and C++ wrapper. +* A "native" implementation of WebGPU using appropriate APIs: D3D12, Metal, Vulkan and OpenGL. +* A client-server implementation of WebGPU for applications that are in a sandbox without access to native drivers. +* Tint, a compiler for the WebGPU Shader Language (WGSL), that converts shaders to and from WGSL. + +Because it is written in C++, Dawn provides better error messages and logging. Because it is open-source, it is easier to inspect stack traces when applications crash. + +Dawn is usually ahead of `wgpu-native`, another WebGPU implementation, when it comes to new functionalities and standards changes. diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/2-env-setup.md b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/2-env-setup.md new file mode 100644 index 0000000000..070d2456c2 --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/2-env-setup.md @@ -0,0 +1,67 @@ +--- +title: Set up a development environment +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +In this Learning Path, you will learn how to: + +* Integrate Dawn (WebGPU) in an application. +* Use the APIs to render a simple 3D object. +* Profile and analyze the application. + +The first step is to prepare a development environment with the required software: + +* [Android Studio](https://developer.android.com/studio) +* [Arm Performance Studio](https://www.arm.com/products/development-tools/graphics/arm-performance-studio) +* Python 3.10 or later + +You can use any computer and operating system which supports the above software. + +## Install Android Studio and the Android NDK + +1. Download and install the latest version of [Android Studio](https://developer.android.com/studio/). + +2. Start Android Studio. + +3. Open the `Settings` dialog. + +4. Navigate to `Languages & Frameworks`, then select `Android SDK`. + +5. In the `SDK Platforms` tab, check `Android 14.0 ("UpsideDownCake")` + +![SDK Platforms #center](images/sdk-platforms.png "SDK Platforms") + +6. In the `SDK Tools` tab check the following: + * Check `Android SDK Build-Tools 35` + * Check `NDK (Side by side)` + * Check `CMake` + +![SDK Tools #center](images/sdk-tools.png "SDK Tools") + +Click OK to install and update the selected components. + +## Install Arm Performance Studio + +Profiling is an important step in the Android application development cycle. + +The default profiler in the Android Studio is great to profile CPU related metrics, but does not provide GPU details. + +Arm Performance Studio is a comprehensive profiling tool to profile both CPUs and GPUs. + +One of the components of Performance Studio is Streamline. Streamline captures data from multiple sources, including: + +* Program Counter (PC) samples from running application threads. +* Samples from the hardware Performance Monitoring Unit (PMU) counters in Arm CPUs, Arm Mali GPUs, and Arm Immortalis GPUs. +* Thread scheduling information from the Linux kernel. +* Software-generated annotations and counters from running applications. + +Install Arm Performance Studio using the [install guide](/install-guides/ams/). + +{{% notice Tip %}} +If you want to learn more about Arm Performance Studio and Streamline before continuing, refer to [Get started with Arm Performance Studio for mobile](https://learn.arm.com/learning-paths/smartphones-and-mobile/ams/ams/) +{{% /notice %}} + +Android Studio and Arm Performance Studio are now installed and you are ready to create a WebGPU Android application. diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/3-integrate-dawn.md b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/3-integrate-dawn.md new file mode 100755 index 0000000000..8b487e4bb7 --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/3-integrate-dawn.md @@ -0,0 +1,158 @@ +--- +title: Create an application which includes Dawn +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Set up Android Project + +You can start by creating a new Android Studio project. + +Open Android studio, click `New Project` and select `Game Activity (C++)` as shown below: + +![New Game Activity #center](./images/android_studio_new_game_activity.png "New C++ Game Activity") + +Set the `Name` to be `dawnwebgpu`. + +Select `Next` to continue. + +Finish the new project creation by accepting all defaults until the project is created. + +The project is created in `~/AndroidStudioProjects`. + +## About the Game Activity + +GameActivity is a Jetpack library designed to assist Android games in processing app cycle commands, input events, and text input in the application's C/C++ code. + +GameActivity is a direct descendant of NativeActivity and shares a similar architecture: + +![Game Activity Architecture #center](./images/GameActivityArchitecture.png "Game Activity Architecture") + +With GameActivity, you can focus on your game development and avoid spending excessive time dealing with the Java Native Interface (JNI) code. + +GameActivity performs the following functions: + +* Interacts with the Android framework through the Java-side component. +* Passes app cycle commands, input events, and input text to the native side. +* Renders into a SurfaceView, making it easier for games to interact with other UI components. + +{{% notice Tip %}} +You can find more information about Android Game Activity and its capabilities in the [Game Activity documentation](https://developer.android.com/games/agdk/game-activity). +{{% /notice %}} + +## Download project source files + +To create a WebGPU application, a number of files from GitHub are doing to be added to your Game Activity project. The objective is to show you how to take the Game Activity template and modify it to become a WebGPU application. + +To get started, open a terminal, create a new directory, and download the project files: + +```bash +mkdir ~/webgpu-files ; cd ~/webgpu-files +wget https://github.com/varunchariArm/Android_DawnWebGPU/archive/refs/heads/main.zip +``` + +Unzip the project files: + +```bash +unzip main.zip +``` + +Yow now have a directory named `Android_DawnWebGPU-main` in your `webgpu-files` directory. + +During the next sections you will copy some of the required files from the `Android_DawnWebGPU-main` directory to your Game Activity project to learn how to create WebGPU applications. + +## Upgrade the application to include Dawn + +Return to Android Studio and start work on the WebGPU application. + +The Android Game Activity framework uses OpenGLES3 for graphics. + +You can remove this dependency and replace it with WebGPU. + +Add WebGPU to the project using the following steps: + +1. In Android Studio, navigate to the project view and find the `app` --> `cpp` folder. + +Open terminal in Android Studio. You should be in the `dawnwebgpu` directory. + +2. Create a new directory and download the WebGPU header file from GitHub + +Run the commands below to download the `webgpu.hpp` header file: + +```console +mkdir -p app/src/main/cpp/webgpu/include/webgpu +cd app/src/main/cpp/webgpu/include/webgpu +cp ~/webgpu-files/Android_DawnWebGPU-main/app/src/main/cpp/webgpu/include/webgpu/webgpu.hpp . +cd ../.. +``` + +3. Next copy the remaining WebGPU files to your project. + +```console +cp ~/webgpu-files/Android_DawnWebGPU-main/app/src/main/cpp/webgpu/CMakeLists.txt . +cp ~/webgpu-files/Android_DawnWebGPU-main/app/src/main/cpp/webgpu/FetchDawn.cmake . +cp ~/webgpu-files/Android_DawnWebGPU-main/app/src/main/cpp/webgpu/fetch_dawn_dependencies.py . +cp ~/webgpu-files/Android_DawnWebGPU-main/app/src/main/cpp/webgpu/webgpu.cmake . +cd .. +``` + +Notice that `FetchDawn.cmake` uses a stable `chromium/6536` branch of Dawn repository. + +{{% notice Note %}} +WebGPU is constantly evolving standard and hence its implementation, Dawn is also under active development. For sake of stability, we have chosen a stable branch for our development. Updating to latest or different branch may cause breakage. +{{% /notice %}} + +To add Dawn to our application, there are 2 options: + +* Create a shared/static library from the Dawn source and use it in application. +* Download the source as a dependency and build it as part of the project build. + +You will use the second option, since it provides more debug flexibility. + +The files `webgpu/webgpu.cmake` and `CMakeLists.txt` facilitate downloading and building WebGPU with Dawn implementation and integrating Dawn into the project. + +4. Add WebGPU to the project. + +WebGPU is added to the project in the file `CMakeLists.txt`. + +Copy the updated file by running the command: + +```bash +cp ~/webgpu-files/Android_DawnWebGPU-main/app/src/main/cpp/CMakeLists.txt . +``` + +Review `CMakeLists.txt` and see that the `options`, `include`, and `add_subdirectory` are added from the original Game Activity file. + +```output +#Set Dawn build options +option(DAWN_FETCH_DEPENDENCIES "" ON) +option(DAWN_USE_GLFW "" ON) +option(DAWN_SUPPORTS_GLFW_FOR_WINDOWING "" OFF) +option(DAWN_USE_X11 "" OFF) +option(ENABLE_PCH "" OFF) + +include(utils.cmake) +add_subdirectory(webgpu) +``` + +Also look at the `CMakeLists.txt` file and see that the `target_link_libraries` is changed to remove the WebGL components and add the `webgpu` libraries. + +```output +# Configure libraries CMake uses to link your target library. +target_link_libraries(dawnwebgpu + # The game activity + game-activity::game-activity + + # webgpu dependency + webgpu + jnigraphics + android + log) +``` + + +The `webgpu.hpp` header file acts like an interface, exposing all WebGPU functions and variables to the main Application. + +Navigate to the next section to continue building the WebGPU application. diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/4-using-webgpu-apis.md b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/4-using-webgpu-apis.md new file mode 100755 index 0000000000..a3acd342c2 --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/4-using-webgpu-apis.md @@ -0,0 +1,185 @@ +--- +title: Using Dawn WebGPU APIs in the application +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Setup project + +With the `webgpudawn` library integrated, you can start by removing the extra files included as part of the stock Game Activity project. + +1. Delete all the files from the top `cpp` directory except `CMakeLists.txt`. + + You have already reviewed `CMakeLists.txt` in the previous section. + +2. Add the files `webgpuRenderer.cpp` and `webgpuRenderer.h` files for the WebGPU application. + + Run the commands below to add a new `main.cpp` and WebGPU renderer files: + + ```console + cp ~/Android_DawnWebGPU/app/src/main/cpp/main.cpp . + cp ~/Android_DawnWebGPU/app/src/main/cpp/tiny_obj_loader.h . + cp ~/Android_DawnWebGPU/app/src/main/cpp/utils.cmake . + cp ~/Android_DawnWebGPU/app/src/main/cpp/webgpuRenderer.cpp . + cp ~/Android_DawnWebGPU/app/src/main/cpp/webgpuRenderer.h . + cp -r ~/Android_DawnWebGPU/app/src/main/cpp/glm . + cp -r ~/Android_DawnWebGPU/app/src/main/cpp/resources . + ``` + + + +## Using Dawn WebGPU APIs + +There are several layers of abstraction between a device GPU and an application running the WebGPU API. + +![WebGPU Application Interface #center](images/webgpu_app_interface.png "WebGPU Application Interface") + +It is useful to understand these layers as you begin to use WebGPU APIs in an application. + +* Physical devices have GPUs. Most devices only have one GPU, but some have more than one. + +* A native GPU API, which is part of the operating system, such as Vulkan or Metal. This is a programming interface allowing native applications to use the capabilities of the GPU. API instructions are sent to the GPU via a driver. It is possible for a system to have multiple native OS APIs and drivers available to communicate with the GPU, although the above diagram assumes a device with only one native API/driver. + +* A WebGPU implementation like Dawn handles communicating with the GPU via a native GPU API driver. A WebGPU adapter effectively represents a physical GPU and driver available on the underlying system, in your code. + +* A logical device is an abstraction which an application uses to access GPU capabilities. Logical devices are required to provide multiplexing capabilities. A physical device's GPU is used by many applications and processes concurrently. Each app needs to be able to access WebGPU in isolation for security and logic reasons. + +### The adapter + +Before requesting access to a **device**, you need to select an **adapter**. + +The same host system may expose multiple adapters if it has access to multiple physical GPUs. It may also have an adapter that represents an emulated/virtual device. Each adapter advertises a list of optional **features** and **supported limits** that it can handle. + +These are used to determine the overall capabilities of the system before **requesting the device**. The **adapter** is used to **access the capabilities** of the user’s hardware, which are used to select the behavior of your application among very different code paths. + +Once a code path is chosen, a device is created with the chosen capabilities. Only the capabilities selected for this device are allowed in the rest of the application. This way, it is **not** possible to inadvertently rely on capabilities specific to a device. + +![Supported Limits #center](images/adapter_supported_limits.png "Adapter Supported Limits") + +{{% notice Tip %}} +In an advanced use of the adapter/device duality, you can set up multiple limit presets and select one depending on the adapter. + +In this case, there is a single preset and abort early if it is not supported. +{{% /notice %}} + +### Requesting the adapter + +An adapter is not something you create, but rather something that you *request* using the function `requestAdapter()`. + +Before doing that you need to create an instance using the `createInstance()` function. + +```C++ +wgpu::Instance instance = createInstance(InstanceDescriptor{}); +``` + +In order to display something on the screen, the operating system needs to provide a place to *draw*, this is commonly known as **a window**. + +The Game Activity provides a *pApp* member which exposes an Android Window. WebGPU can use an Android Window for rendering. + +WebGPU cannot use the *window* directly, but uses something called **a surface**, which can be easily created using the window. + +```C++ +wgpu::SurfaceDescriptorFromAndroidNativeWindow platformSurfaceDescriptor = {}; +platformSurfaceDescriptor.chain.next = nullptr; +platformSurfaceDescriptor.chain.sType = SType::SurfaceDescriptorFromAndroidNativeWindow; +platformSurfaceDescriptor.window = app_->window; //app_ comes from the game activity +wgpu::SurfaceDescriptor surfaceDescriptor = {}; +surfaceDescriptor.label = "surfaceDescriptor"; +surfaceDescriptor.nextInChain = reinterpret_cast(&platformSurfaceDescriptor); +wgpu::Surface surface = instance.createSurface(surfaceDescriptor); +``` + +Once a Surface is available, you can request the adapter using the `requestAdapter()` function as shown below: + +```C++ +wgpu::RequestAdapterOptions adapterOpts{}; +adapterOpts.compatibleSurface = surface; +wgpu::Adapter adapter = instance.requestAdapter(adapterOpts); +``` + +After successful adapter creation, you can query basic information such as the GPU vendor, underlying graphics APIs, and more. + +```C++ +wgpu::AdapterInfo adapterInfo; +adapter.getInfo(&adapterInfo); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", "vendor.."); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", adapterInfo.vendor); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", "architecture.."); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", adapterInfo.architecture); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", "device.."); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", adapterInfo.device); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", "description.."); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", adapterInfo.description); +std::string backend = std::to_string((int)adapterInfo.backendType); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", "backendType.."); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", backend.c_str()); +``` + +### Creating a device + +In order to create a device that meets the requirements for the application, you need to specify *required limits*. + +There are few options to set the limits: + +* Choose default limits: + +```C++ +wgpu::RequiredLimits requiredLimits = Default; +``` + +* Query the Adapter's *supported limits* and use them as *required limits*: + +```C++ +wgpu::SupportedLimits supportedLimits; +adapter.getLimits(&supportedLimits); +wgpu::RequiredLimits requiredLimits = Default; +requireLimits.limits = supportedLimits.limits; +``` + +* Query the Adapter's *supported limits* and define specific *better* limits in the *required limits*: + +```C++ +wgpu::SupportedLimits supportedLimits; +adapter.getLimits(&supportedLimits); +wgpu::RequiredLimits requiredLimits = Default; +requiredLimits.limits.maxVertexAttributes = 3; +requiredLimits.limits.maxVertexBuffers = 1; +requiredLimits.limits.minStorageBufferOffsetAlignment = supportedLimits.limits.minStorageBufferOffsetAlignment; +requiredLimits.limits.minUniformBufferOffsetAlignment = supportedLimits.limits.minUniformBufferOffsetAlignment; +//Define other limits as required + +``` + +{{% notice Tip %}} +Setting *better* limits may not be desirable, as doing so may have a performance impact. To improve portability across devices and implementations, applications should generally only request better limits if they are required. + +It is recommended to read more about ["Supported Limits"](https://developer.mozilla.org/en-US/docs/Web/API/GPUSupportedLimits) and ["limits"](https://gpuweb.github.io/gpuweb/#limits). +{{% /notice %}} + +Use the `requestDevice()` API to request device: + +```C++ +wgpu::DeviceDescriptor deviceDesc; +deviceDesc.label = "My Device"; +deviceDesc.requiredFeatureCount = 0; +deviceDesc.requiredLimits = &requiredLimits; +deviceDesc.defaultQueue.label = "The default queue"; +wgpu::Device device = adapter.requestDevice(deviceDesc); +__android_log_print(ANDROID_LOG_INFO, "NATIVE", "%s", "Got device"); +static auto errorCallback = device.setUncapturedErrorCallback([](ErrorType type, char const* message) { + __android_log_print(ANDROID_LOG_ERROR, "NATIVE", "%s", "Got device error"); + __android_log_print(ANDROID_LOG_ERROR, "NATIVE", "%s", "error type:"); + std::string t = std::to_string((int)type); + __android_log_print(ANDROID_LOG_ERROR, "NATIVE", "%s", t.c_str()); + __android_log_print(ANDROID_LOG_ERROR, "NATIVE", "%s", "error message:"); + __android_log_print(ANDROID_LOG_ERROR, "NATIVE", "%s", message); +}); +``` + +{{% notice Tip %}} +While creating a device, use a callback function `setUncapturedErrorCallback`, this helps in capturing validation and other errors with the WebGPU device. +{{% /notice %}} + +Proceed to learn how to render 3D objects. \ No newline at end of file diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/5-render-a-simple-3D-object-part-1.md b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/5-render-a-simple-3D-object-part-1.md new file mode 100644 index 0000000000..f807a40b92 --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/5-render-a-simple-3D-object-part-1.md @@ -0,0 +1,155 @@ +--- +title: Render a simple 3D object +weight: 6 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Command queue + +Graphic applications have to deal with 2 processors, the CPU and the GPU. + +These 2 processors run on different timelines. For optimal performance, commands intended for the GPU are batched and sent through a command queue. The GPU consumes this queue whenever it is ready, and this way processors minimize the time spent idling for their sibling to respond. + +A WebGPU device has a single queue, which is used to send both commands and data. You can get it with `wgpuDeviceGetQueue()`. + +WebGPU offers 3 different ways to submit work to this queue: + +* wgpuQueueSubmit +* wgpuQueueWriteBuffer +* wgpuQueueWriteTexture + +{{% notice Note %}} +Other graphics API allow you to build multiple queues per device, and future version of WebGPU might as well. + +If you want to learn more, refer to the [Command Queue](https://eliemichel.github.io/LearnWebGPU/getting-started/the-command-queue.html) +{{% /notice %}} + +## Getting started to render a 3D object + +WebGPU is a very simple system. All it does is run 3 types of functions on the GPU: Vertex Shaders, Fragment Shaders, and Compute Shaders. + +* A Vertex Shader computes vertices. The shader returns vertex positions. +* A Fragment Shader computes colors, they indirectly write data to textures. That data does not have to be colors. +* A Compute Shader is more generic. It’s effectively a function you call and say “execute this function N times” + +Here is a simplified diagram of WebGPU setup to draw triangles by using a vertex shader and a fragment shader: + +!["Triangle using WebGPU" #center](images/webgpu-draw-high-level.svg "Triangle using WebGPU") + +The main things to notice in the above image are: + +* There is a **Pipeline**. It contains the vertex shader and fragment shader the GPU will run. You could also have a pipeline with a compute shader. +* The shaders reference resources (buffers, textures, samplers) indirectly through **Bind Groups**. +* The pipeline defines attributes that reference buffers indirectly through the internal state. +* Attributes pull data out of buffers and feed the data into the vertex shader. +* The vertex shader may feed data into the fragment shader. +* The fragment shader writes to textures indirectly through the render pass description. + +To execute shaders on the GPU, you need to create all of these resources and set up this state. Creation of resources is relatively straightforward. + +{{% notice Note %}} +Most WebGPU resources can not be changed after creation. You can change their contents but not their size, usage, and format. + +If you want to change something create a new resource and destroy the old one. +{{% /notice %}} + +## Render Pipeline + +In order to achieve high performance real-time 3D rendering, the GPU processes shapes through a predefined pipeline. The pipeline itself is always the same, but you can configure it in many ways. + +To do so, WebGPU provides a Render Pipeline object. The figure below illustrates the sequence of data processing stages executed by the render pipeline. + +!["Render Pipeline" #center](images/render-pipeline.svg "Render Pipeline") + +The Render Pipeline has 2 main types of stages, **fixed-function** and **programmable**. + +### Fixed Functions stages + +The pipeline description consists of the following steps: + +* Describe vertex pipeline state +* Describe vertex pipeline state +* Describe primitive pipeline state +* Describe fragment pipeline state +* Describe stencil/depth pipeline state +* Describe multi-sampling state +* Describe pipeline layout + +The fixed function stages are well documented and you can refer to [code](https://github.com/varunchariArm/Android_DawnWebGPU/blob/main/app/src/main/cpp/webgpuRenderer.cpp#L256) and [further reading](https://eliemichel.github.io/LearnWebGPU/basic-3d-rendering/hello-triangle.html#lit-24) for configuring them. + +Configuring these stages is straight forward and is similar to other graphics APIs. + +### Programmable stage + +There are two programmable stages, vertex and fragment programmable stages. Both of them uses **Shader Module**. + +### Shaders + +Both the vertex and fragment programmable stages can use the same shader module or have individual shader modules. + +The Shader module is kind of a dynamic library (like a .dll, .so or .dylib file), except that it talks the binary language of your GPU rather than your CPU. + +### Shader Code + +The shader language officially used by WebGPU is called WebGPU Shading Language, [WGSL](https://gpuweb.github.io/gpuweb/wgsl/). + +All implementations of WebGPU support it, and Dawn also offers the possibility to provide shaders written in [SPIR-V](https://www.khronos.org/spir). + +{{% notice Note %}} +WGSL was originally designed to be a human-editable version of SPIR-V programming model, so transpilation from SPIR-V to WGSL is in theory efficient and lossless. You can use [Naga](https://github.com/gfx-rs/naga) or [Tint](https://dawn.googlesource.com/tint) to translate. +{{% /notice %}} + +It is highly recommended to understand WGSL syntax and capabilities to better program in WebGPU. + +### Shader Module Creation + +It is simple to create a Shader module in WebGPU: + +```C++ +ShaderModuleDescriptor shaderDesc; +ShaderModule shaderModule = device.createShaderModule(shaderDesc); +``` + +By default the `nextInChain` member of `ShaderModuleDescriptor` is a `nullptr`. + +The `nextInChain` pointer is the entry point of WebGPU’s extension mechanism. It is either null or pointing to a structure of type `WGPUChainedStruct`. + +It may recursively have a next element (again, either null or pointing to some `WGPUChainedStruct`). + +Second, it has a struct type `sType`, which is an enum telling in which struct the chain element can be cast. + +To create a shader module from WGSL code, use the `ShaderModuleWGSLDescriptor` SType. + +In Dawn, a SPIR-V shader can similarly be created using the `WGPUShaderModuleSPIRVDescriptor`. + +The field shaderCodeDesc.chain corresponds to the chained struct when cast as a simple `WGPUChainedStruct`, which must be set to the corresponding SType enum value: + +```C++ +ShaderModuleWGSLDescriptor shaderCodeDesc; +// Set the chained struct's header +shaderCodeDesc.chain.next = nullptr; +shaderCodeDesc.chain.sType = SType::ShaderModuleWGSLDescriptor; +// Connect the chain +shaderDesc.nextInChain = &shaderCodeDesc.chain; +shaderCodeDesc.code = shaderSource; +``` + +In this project, a helper function [`loadShaderModule`](https://github.com/varunchariArm/Android_DawnWebGPU/blob/main/app/src/main/cpp/webgpuRenderer.cpp#L450) reads the shader code from a file and creates a shader module for the device. + +### Create Render Pipeline + +The shaders might need to access input and output resources such as buffers and textures. + +These resources are made available to the pipeline by configuring a memory layout. + +Now you can finally create a *render pipeline* by calling the `createRenderPipeline()` : + +```C++ +wgpu::RenderPipelineDescriptor pipelineDesc; +//Configure fixed and programable stages +wgpu::RenderPipeline pipeline = device.createRenderPipeline(pipelineDesc); +``` + +Continue to the next section to learn more about rendering a 3D object. \ No newline at end of file diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/6-render-a-simple-3D-object-part-2.md b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/6-render-a-simple-3D-object-part-2.md new file mode 100644 index 0000000000..6378bd1a67 --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/6-render-a-simple-3D-object-part-2.md @@ -0,0 +1,179 @@ +--- +title: Build and run a WebGPU application +weight: 7 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## 3D meshes + +Once a Render Pipeline is created, you can use WebGPU APIs to create and render a 3D mesh. This is very similar to other graphics APIs. + +The steps are listed below: + +* Create the Vertex Buffer(s) +* Create the Index Buffer(s) +* Create the Uniform Buffer(s) +* Create a Depth Buffer (Z-Buffer algorithm) +* Create the Depth Texture and TextureView +* Create a Depth Stencil +* Create the Transformation and Projection matrices + +All these steps are common in graphics programming and WebGPU offers capability to perform all the operations. + +It is recommended to go through individual chapters in the [3D rendering](https://eliemichel.github.io/LearnWebGPU/basic-3d-rendering/index.html) section to learn more. + +### Loading 3D objects + +In this project you can use OBJ files to define 3D meshes. + +Instead of manually parsing OBJ files, use the [TinyOBJLoader](https://github.com/tinyobjloader/tinyobjloader) library. + +The file format is not complex, but parsing files is not the goal of this Learning Path. + +You can use open-source software such as Blender to create your own 3D objects. + +{{% notice Note %}} +Exactly one of your source files must define `TINYOBJLOADER_IMPLEMENTATION` before including it: + +```C++ +#define TINYOBJLOADER_IMPLEMENTATION // add this to exactly 1 of your C++ files +#include "tiny_obj_loader.h" +``` +{{% /notice %}} + +There is a helper function [`loadGeometryFromObj`](https://github.com/varunchariArm/Android_DawnWebGPU/blob/main/app/src/main/cpp/webgpuRenderer.cpp#L475) available to load objects. + +You are now ready to render a 3D object. + +## Rendering using WebGPU + +You can run a rendering pass and *draw* something onto our *surface*. + +To encode any commands to be issued to GPU, you need to create a `CommandEncoder`. Modern APIs record commands into command buffers,rather than issuing commands one by one, and submit all of them at once. + +In WebGPU, this is done through a `CommandEncoder` as shown below: + +```C++ +wgpu::CommandEncoderDescriptor commandEncoderDesc; +commandEncoderDesc.label = "Command Encoder"; +wgpu::CommandEncoder encoder = device.createCommandEncoder(commandEncoderDesc); +``` + +The next step is to create a `RenderPassEncoder`. + +It encodes commands related to controlling the vertex and fragment shader stages, as issued by RenderPipeline. + +It forms part of the overall encoding activity of a CommandEncoder. A render pipeline renders graphics to Texture attachments, typically intended for displaying on a surface, but it could also render to textures used for other purposes that never appear onscreen. + +It has two main stages: + +* A vertex stage, in which a vertex shader takes positioning data fed into the GPU and uses it to position a series of vertices in 3D space by applying specified effects like rotation, translation, or perspective. +* A fragment stage, in which a fragment shader computes the color for each pixel covered by the primitives produced by the vertex shader. + +You can create a RenderPassEncoder using the `encoder.beginRenderPass()` API: + +```C++ +wgpu::RenderPassDescriptor renderPassDesc{}; + +wgpu::RenderPassColorAttachment renderPassColorAttachment{}; +renderPassColorAttachment.view = nextTexture; +renderPassColorAttachment.resolveTarget = nullptr; +renderPassColorAttachment.loadOp = wgpu::LoadOp::Clear; +renderPassColorAttachment.storeOp = wgpu::StoreOp::Store; +renderPassColorAttachment.clearValue = Color{ 1, 1, 1, 1.0 }; +renderPassColorAttachment.depthSlice = WGPU_DEPTH_SLICE_UNDEFINED; +renderPassDesc.colorAttachmentCount = 1; +renderPassDesc.colorAttachments = &renderPassColorAttachment; +renderPassDesc.timestampWrites = nullptr; +wgpu::RenderPassEncoder renderPass = encoder.beginRenderPass(renderPassDesc); +``` + +{{% notice Note %}} +`ColorAttachment` is the only mandatory field. Make sure you have specified `renderPassColorAttachment.depthSlice`. + +It is recommended to go through the ColorAttachment [members](https://gpuweb.github.io/gpuweb/#color-attachments). +{{% /notice %}} + +You can invoke the following APIs to draw 3D object: + +* `renderPass.setPipeline(...);` +* `renderPass.setVertexBuffer(...)` +* `renderPass.setBindGroup(...)` +* `renderPass.draw(...)` + +To finish encoding the sequence of commands and issue them to the GPU, a few more API calls are needed: + +* End render pass `renderPass.end()` +* Finish the command + + ```C++ + wgpu::CommandBufferDescriptor cmdBufferDescriptor{}; + cmdBufferDescriptor.label = "Command buffer"; + wgpu::CommandBuffer command = encoder.finish(cmdBufferDescriptor); + encoder.release(); + ``` + +* Submit the Queue `queue.submit(command)` +* Present the object onto surface `surface_.present()` + +{{% notice Tip %}} +Make sure you release the created encoders and buffers by calling the respective `.release()` in order to avoid dangling pointers or other errors. +{{% /notice %}} + +{{% notice Note %}} +By default Dawn runs callbacks only when the device “ticks”, so the error callbacks are invoked in a different call stack than where the error occurred, making the breakpoint less informative. + +To force Dawn to invoke error callbacks as soon as there is an error, you can enable an instance toggle: + +```C++ +#ifdef WEBGPU_BACKEND_DAWN +// Make sure the uncaptured error callback is called as soon as an error +// occurs rather than at the next call to "wgpuDeviceTick". +WGPUDawnTogglesDescriptor toggles; +toggles.chain.next = nullptr; +toggles.chain.sType = WGPUSType_DawnTogglesDescriptor; +toggles.disabledToggleCount = 0; +toggles.enabledToggleCount = 1; +const char* toggleName = "enable_immediate_error_handling"; +toggles.enabledToggles = &toggleName; + +desc.nextInChain = &toggles.chain; +#endif // WEBGPU_BACKEND_DAWN +``` + +Toggles are Dawn’s way of enabling/disabling features at the scale of the whole WebGPU instance. + +See the complete list in [Toggle.cpp](https://dawn.googlesource.com/dawn/+/refs/heads/main/src/dawn/native/Toggles.cpp#33). +{{% /notice %}} + +## Building and running the application + +You are now ready to build and run the application. + +First, copy the files with the shader code and 3D object files to a connected phone: + +```bash +cd ~/AndroidStudioProjects/dawnwebgpu/app/src/main/cpp +adb shell "mkdir /data/local/tmp/webgpu/" +adb push resources/shader_texture_file.wgsl /data/local/tmp/webgpu/ +adb push resources/cone_in_turdis.obj /data/local/tmp/webgpu/ +adb push resources/cone_in_turdis.mtl /data/local/tmp/webgpu/ +``` + +{{% notice Note %}} +If `adb` is not in your search path, enter the path to `adb`. + +For example: +```bash +~/Library/Android/sdk/platform-tools/adb shell "mkdir /data/local/tmp/webgpu/" +``` + +{{% /notice %}} + +Now click the **Run** icon in Android Studio, which builds the application and launches it on the connected device, producing the following output: + +![Output #center](images/output.gif "Output") + +Congratulations! You are run a WebGPU application on an Android device. \ No newline at end of file diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/7-Profiling-App-using-Streamline.md b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/7-Profiling-App-using-Streamline.md new file mode 100644 index 0000000000..583395c6b4 --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/7-Profiling-App-using-Streamline.md @@ -0,0 +1,59 @@ +--- +title: Profiling the Application using Streamline +weight: 8 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Setup Arm Streamline + +Follow these steps to configure Arm Streamline Performance Analyzer to capture Mali GPU related data: + +* Confirm you Android device is connected to the development machine. +* Navigate to **Start tab**, select **Android (adb)**, select the device and then select the application to debug. +* Click on **Select Counters** + +![Select device #center](images/streamline_select.png "Select device") + +which opens a new window: + +![Select counters #center](images/streamline_select_counters.png "Select counters") + +* Search for **Mali Timeline Events: Perfetto** +* Make sure it is listed in the **Events to collect** +* Click Save + +## Profiling the application using Streamline + +Once you have selected the device, the application and metrics to be collected, click on the **start capture** button. + +This automatically starts the application and begins collecting the profiling data. + +Make sure the application is running as desired on your Android device. After a few seconds, you can stop the capture. + +Wait until Streamline completes processing the data. + +Switch to *Mali Timeline* view as shown below: + +!["Mali Timeline Streamline" #center](images/Streamline-mali-timeline.png "Mali Timeline Streamline") + +You may have to zoom into the data to the maximum (`500 us`), since you are rendering a very simple 3D object. + +You can analyze 2 consecutive frames as shown below: + +!["Two consecutive frames" #center](./images/Streamline-mali-analysis.png "Two consecutive frames") + +Arm has worked with the Dawn team to optimize data uploading to GPU buffers for Mali GPUs. + +Arm has implemented a **Fast Path** mechanism wherein the Vertex Queue starts processing in parallel while an earlier Fragment Queue is being processed. + +As you can see from the above picture, there is some overlap between Fragment Queue of first frame and Vertex Queue of the consecutive frame. + +This shows that the application is hitting the **Fast Path** that Arm has implemented to optimize performance of Dawn for Mali GPUs. + +The overlap is small since the application is rendering the same simple 3D object under different orientation. You can extend the application to render complex objects with multiple *Uniform Buffers*. This would show the overlap in more detail. + +{{% notice Tip %}} +Feel free to experiment with different counters in Streamline and explore the other CPU profiling data as well. +{{% /notice %}} diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/_index.md b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/_index.md new file mode 100644 index 0000000000..f1a307a26b --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/_index.md @@ -0,0 +1,52 @@ +--- +title: Build and profile a simple WebGPU Android Application + +draft: true +cascade: + draft: true + +minutes_to_complete: 90 + +who_is_this_for: This is an introductory topic for developers building GPU based Android applications and interested in trying WebGPU. + +learning_objectives: + - Understand the benefits of WebGPU and Dawn, a WebGPU implementation. + - Set up a WebGPU development environment. + - Integrate Dawn in an Android Application. + - Use Dawn WebGPU APIs in the application. + - Understand the changes required to upgrade to WebGPU to render a simple 3D object. + - Build and run a WebGPU Android Application. + - Profile the Application using Streamline. + - Analyze the profiling data. + +prerequisites: + - Basic knowledge of graphics APIs and experience with developing Android graphics applications. + - A development machine with Android Studio, Blender, and Arm Streamline installed. + - An Android phone in developer mode. + +author_primary: Varun Chari, Albin Bernhardsson + +### Tags +skilllevels: Advanced +subjects: GPU +armips: + - Cortex-A +tools_software_languages: + - Mobile + - Java + - Kotlin + - C++ + - Python +operatingsystems: + - macOS + - Linux + - Windows + - Android + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/_next-steps.md b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/_next-steps.md new file mode 100644 index 0000000000..1fc22f142c --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/_next-steps.md @@ -0,0 +1,43 @@ +--- +next_step_guidance: Now that you are familiar with building graphical applications using WebGPU, you are ready to incorporate WebGPU into your Android applications. + +recommended_path: "/learning-paths/smartphones-and-mobile/ams/" + +further_reading: + - resource: + title: WebGPU example application + link: https://github.com/varunchariArm/Android_DawnWebGPU + type: website + - resource: + title: WebGPU working draft + link: https://www.w3.org/TR/webgpu/ + type: website + - resource: + title: Dawn Github repository + link: https://github.com/google/dawn + type: website + - resource: + title: WebGPU API + link: https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API + type: website + - resource: + title: WebGPU fundamentals 2 + link: https://webgpufundamentals.org/ + type: website + - resource: + title: Learn WebGPU + link: https://eliemichel.github.io/LearnWebGPU/index.html + type: website + - resource: + title: WebGPU examples 2 + link: https://github.com/samdauwe/webgpu-native-examples + type: website + + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +weight: 21 # set to always be larger than the content in this path, and one more than 'review' +title: "Next Steps" # Always the same +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/_review.md b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/_review.md new file mode 100644 index 0000000000..f12277b62d --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/_review.md @@ -0,0 +1,44 @@ +--- +review: + - questions: + question: > + What is WebGPU? + answers: + - A highly customized API for specific GPUs. + - APIs designed to provide unified access to GPUs whichever the GPU vendor and operating system the application runs with. + - APIs designed for cloud-based applications. + correct_answer: 2 + explanation: > + WebGPU is a Render Hardware Interface built on top of the various APIs provided by the driver/OS depending on your platform. This duplicated development effort is made once by the web browsers and made available to us through the webgpu.h header they provide + + - questions: + question: > + What is Dawn? + answers: + - An open-source WebGPU implementation lead by Google. + - A community-driven WebGPU implementation. + - A new programming language to program GPUs. + correct_answer: 1 + explanation: > + Dawn is an open-source and cross-platform implementation of the WebGPU standard, lead by Google. More precisely it implements webgpu.h that is a one-to-one mapping with the WebGPU IDL. + + - questions: + question: > + What is Arm Streamline? + answers: + - A profiling tool to profile CPUs. + - A profiling tool to profile GPUs. + - A a comprehensive profiling software to profile both CPUs and GPUs. + correct_answer: 3 + explanation: > + Streamline is an application profiler that can capture data from multiple sources, including Program Counters (PC), Samples from the hardware Performance Monitoring Unit (PMU) counters in the Arm CPU, Arm® Mali™ GPUs, and Arm Immortalis™ GPUs. + + + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +title: "Review" # Always the same title +weight: 20 # Set to always be larger than the content in this path +layout: "learningpathall" # All files under learning paths have this same wrapper +--- \ No newline at end of file diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/GameActivityArchitecture.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/GameActivityArchitecture.png new file mode 100644 index 0000000000..717875772e Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/GameActivityArchitecture.png differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/Streamline-mali-analysis.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/Streamline-mali-analysis.png new file mode 100644 index 0000000000..a7ec578686 Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/Streamline-mali-analysis.png differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/Streamline-mali-timeline.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/Streamline-mali-timeline.png new file mode 100644 index 0000000000..799734c347 Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/Streamline-mali-timeline.png differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/adapter_supported_limits.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/adapter_supported_limits.png new file mode 100644 index 0000000000..f16dbfb484 Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/adapter_supported_limits.png differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/android_studio_new_game_activity.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/android_studio_new_game_activity.png new file mode 100644 index 0000000000..a0d45f644c Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/android_studio_new_game_activity.png differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/output.gif b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/output.gif new file mode 100644 index 0000000000..8a02ef6b45 Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/output.gif differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/output.mov b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/output.mov new file mode 100644 index 0000000000..e1fa7f8fe0 Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/output.mov differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/render-pipeline.svg b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/render-pipeline.svg new file mode 100644 index 0000000000..13d328c922 --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/render-pipeline.svg @@ -0,0 +1,3121 @@ + + + + + + + + + + + + + + + + + + + + + + + + vertex shader + + vertex fetch + attachments + (set throught the Render Pass) + fixed-function stage + programmable stage + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + depth + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + color + + + + + + + + + fragment shader + + primitive assembly + + rasterization + + blending + + (stencil test & write) + + depth test & write + + diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/sdk-platforms.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/sdk-platforms.png new file mode 100644 index 0000000000..f6d5a98c10 Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/sdk-platforms.png differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/sdk-tools.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/sdk-tools.png new file mode 100644 index 0000000000..4450f0b9a2 Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/sdk-tools.png differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/streamline_select.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/streamline_select.png new file mode 100644 index 0000000000..ceab14b9aa Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/streamline_select.png differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/streamline_select_counters.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/streamline_select_counters.png new file mode 100644 index 0000000000..eedc9a3bf9 Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/streamline_select_counters.png differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/webgpu-draw-high-level.svg b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/webgpu-draw-high-level.svg new file mode 100644 index 0000000000..efd2844828 --- /dev/null +++ b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/webgpu-draw-high-level.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/webgpu_app_interface.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/webgpu_app_interface.png new file mode 100644 index 0000000000..43663514b0 Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/webgpu_app_interface.png differ diff --git a/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/webgpu_highlevel.png b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/webgpu_highlevel.png new file mode 100644 index 0000000000..f3248888a3 Binary files /dev/null and b/content/learning-paths/smartphones-and-mobile/android_webgpu_dawn/images/webgpu_highlevel.png differ diff --git a/content/migration/_index.md b/content/migration/_index.md index caef862bf7..ccd7e284ff 100644 --- a/content/migration/_index.md +++ b/content/migration/_index.md @@ -39,12 +39,12 @@ AWS offers more than [150 instance types with Graviton processors](https://aws.a {{< /tab >}} {{< tab header="Google GCP">}} -Google GCP offers a varity of [virtual machine instances with Arm processors](https://cloud.google.com/compute/docs/instances/arm-on-compute). The largest instance has 48 vCPUs and 192 Gb of RAM. It does not offer bare-metal instances. +Google GCP offers a variety of [virtual machine instances with Arm processors](https://cloud.google.com/compute/docs/instances/arm-on-compute). The latest generation of Arm-based VMs are based on Google Axion processor. The largest instance has 72 vCPUs and 576 Gb of RAM. It does not offer bare-metal instances. It offers `highcpu` and `highmem` VM instances for compute and memory intensive workloads respectively. | Generation | Arm CPU | Instance types | Comments | | --------------|--------------|--------------------|-----------| | T2A | Neoverse-N1 | T2A-standard | Optimized for general-purpose workloads - web servers, and microservices. | - +| Axion (C4A) | Neoverse-V2 | c4a-standard, c4a-highmem, c4a-highcpu | General-purpose, AI/ML workloads and high performance computing. | {{< /tab >}} {{< tab header="Microsoft Azure">}} @@ -118,7 +118,8 @@ Which tools are available for building and running containers on Arm servers? | AWS CodeBuild | [Build and share Docker images using AWS CodeBuild](https://learn.arm.com/learning-paths/servers-and-cloud-computing/codebuild/) | | | Docker Build Cloud | [Build multi-architecture container images with Docker Build Cloud](https://learn.arm.com/learning-paths/cross-platform/docker-build-cloud/) | [Supercharge your Arm builds with Docker Build Cloud: Efficiency meets performance](https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/supercharge-arm-builds-with-docker-build-cloud) | | GitHub Actions (GitHub runners) | [Build multi-architecture container images with GitHub Arm-hosted runners](https://learn.arm.com/learning-paths/cross-platform/github-arm-runners/) | [Arm64 on GitHub Actions: Powering faster, more efficient build systems](https://github.blog/news-insights/product-news/arm64-on-github-actions-powering-faster-more-efficient-build-systems/) | -| GitHub Actions (AWS Graviton runners) | [Managed, self-hosted Arm runners for GitHub Actions](https://learn.arm.com/learning-paths/servers-and-cloud-computing/github-actions-runner/) | | +| GitHub Actions (AWS Graviton runners) | [Managed, self-hosted Arm runners for GitHub Actions](https://learn.arm.com/learning-paths/servers-and-cloud-computing/github-actions-runner/) | +| GitLab (GitLab runners) | [Build a CI/CD pipeline with GitLab on Google Axion](https://learn.arm.com/learning-paths/cross-platform/gitlab/) | | {{< /tab >}} @@ -132,11 +133,12 @@ Which programming languages work on Arm servers? - Nearly all of them. | Rust | [Rust Install Guide](https://learn.arm.com/install-guides/rust/) | [Neon Intrinsics in Rust](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/rust-neon-intrinsics) | | Java | [Java Install Guide](https://learn.arm.com/install-guides/java/) | [Improving Java performance on Neoverse N1 systems](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/java-performance-on-neoverse-n1) | | | [Migrating Java applications](https://learn.arm.com/learning-paths/servers-and-cloud-computing/migration/java/) | [Java Vector API on AArch64](https://community.arm.com/arm-community-blogs/b/high-performance-computing-blog/posts/java-vector-api-on-aarch64) | -| | | [Java on Graviton](https://github.com/aws/aws-graviton-getting-started/blob/main/java.md) | +| | [Run Java applications on Google Axion](https://learn.arm.com/learning-paths/servers-and-cloud-computing/java-on-axion/)| [Java on Graviton](https://github.com/aws/aws-graviton-getting-started/blob/main/java.md) | | | | [Optimizing Java Workloads on Azure General Purpose D-series v5 VMs with Microsoft’s Build of OpenJDK](https://techcommunity.microsoft.com/t5/azure-compute-blog/optimizing-java-workloads-on-azure-general-purpose-d-series-v5/ba-p/3827610) | | | | [Improving Java performance on OCI Ampere A1 compute instances](https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/performance-of-specjbb2015-on-oci-ampere-a1-compute-instances) | | Go | [Go Install Guide](https://learn.arm.com/install-guides/go/) | [Making your Go workloads up to 20% faster with Go 1.18 and AWS Graviton](https://aws.amazon.com/blogs/compute/making-your-go-workloads-up-to-20-faster-with-go-1-18-and-aws-graviton/)| | .NET | [.NET Install Guide](https://learn.arm.com/install-guides/dotnet/) | [Arm64 Performance Improvements in .NET 7](https://devblogs.microsoft.com/dotnet/arm64-performance-improvements-in-dotnet-7/) | +| | [Deploy .NET application on Azure Cobalt 100 VMs](https://learn.arm.com/learning-paths/servers-and-cloud-computing/azure-cobalt-cicd-aks/) | [Arm64 Performance Improvements in .NET 8](https://devblogs.microsoft.com/dotnet/this-arm64-performance-in-dotnet-8/) | | Python | | [Python on Arm](https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/python-on-arm)| | PHP | | [Improving performance of PHP for Arm64 and impact on AWS Graviton2 based EC2 instances](https://aws.amazon.com/blogs/compute/improving-performance-of-php-for-arm64-and-impact-on-amazon-ec2-m6g-instances/) | diff --git a/data/stats_current_test_info.yml b/data/stats_current_test_info.yml index 1f216e0efb..ecd46e69b0 100644 --- a/data/stats_current_test_info.yml +++ b/data/stats_current_test_info.yml @@ -1,7 +1,7 @@ summary: - content_total: 289 + content_total: 301 content_with_all_tests_passing: 32 - content_with_tests_enabled: 33 + content_with_tests_enabled: 34 sw_categories: cross-platform: dynamic-memory-allocator: @@ -74,6 +74,9 @@ sw_categories: readable_title: PyTorch tests_and_status: - ubuntu:latest: passed + swift: + readable_title: Swift + tests_and_status: [] terraform: readable_title: Terraform tests_and_status: diff --git a/data/stats_weekly_data.yml b/data/stats_weekly_data.yml index 1a764fcbb5..4844d39087 100644 --- a/data/stats_weekly_data.yml +++ b/data/stats_weekly_data.yml @@ -3671,3 +3671,166 @@ avg_close_time_hrs: 0 num_issues: 6 percent_closed_vs_total: 0.0 +- a_date: '2024-11-04' + content: + cross-platform: 25 + embedded-systems: 19 + install-guides: 89 + laptops-and-desktops: 31 + microcontrollers: 24 + servers-and-cloud-computing: 87 + smartphones-and-mobile: 24 + total: 299 + contributions: + external: 43 + internal: 348 + github_engagement: + num_forks: 30 + num_prs: 12 + individual_authors: + alaaeddine-chakroun: 1 + alexandros-lamprineas: 1 + annie-tallund: 1 + arm: 3 + arnaud-de-grandmaison: 1 + basma-el-gaabouri: 1 + bolt-liu: 2 + brenda-strech: 1 + chen-zhang: 1 + christopher-seidl: 7 + cyril-rohr: 1 + daniel-gubay: 1 + daniel-nguyen: 1 + david-spickett: 2 + dawid-borycki: 30 + diego-russo: 1 + diego-russo-and-leandro-nunes: 1 + elham-harirpoush: 2 + florent-lebeau: 5 + "fr\xE9d\xE9ric--lefred--descamps": 2 + gabriel-peterson: 5 + gayathri-narayana-yegna-narayanan: 1 + graham-woodward: 1 + iago-calvo-lista,-arm: 1 + james-whitaker,-arm: 1 + jason-andrews: 88 + joe-stech: 1 + johanna-skinnider: 2 + jonathan-davies: 2 + jose-emilio-munoz-lopez,-arm: 1 + julie-gaskin: 4 + julio-suarez: 5 + kasper-mecklenburg: 1 + koki-mitsunami: 1 + konstantinos-margaritis: 7 + kristof-beyls: 1 + liliya-wu: 1 + mathias-brossard: 1 + michael-hall: 5 + nikhil-gupta,-pareena-verma,-nobel-chowdary-mandepudi,-ravi-malhotra: 1 + odin-shen: 1 + owen-wu,-arm: 2 + pareena-verma: 35 + pareena-verma,-annie-tallund: 1 + pareena-verma,-jason-andrews,-and-zach-lasiuk: 1 + pareena-verma,-joe-stech,-adnan-alsinan: 1 + pranay-bakre: 4 + przemyslaw-wirkus: 1 + rin-dobrescu: 1 + roberto-lopez-mendez: 2 + ronan-synnott: 45 + thirdai: 1 + tom-pilar: 1 + uma-ramalingam: 1 + varun-chari,-pareena-verma: 1 + visualsilicon: 1 + ying-yu: 1 + ying-yu,-arm: 1 + zach-lasiuk: 1 + zhengjun-xing: 2 + issues: + avg_close_time_hrs: 0 + num_issues: 16 + percent_closed_vs_total: 0.0 +- a_date: '2024-11-11' + content: + cross-platform: 25 + embedded-systems: 19 + install-guides: 89 + laptops-and-desktops: 32 + microcontrollers: 24 + servers-and-cloud-computing: 88 + smartphones-and-mobile: 24 + total: 301 + contributions: + external: 43 + internal: 353 + github_engagement: + num_forks: 30 + num_prs: 13 + individual_authors: + alaaeddine-chakroun: 1 + alexandros-lamprineas: 1 + annie-tallund: 1 + arm: 3 + arnaud-de-grandmaison: 1 + basma-el-gaabouri: 1 + bolt-liu: 2 + brenda-strech: 1 + chen-zhang: 1 + christopher-seidl: 7 + cyril-rohr: 1 + daniel-gubay: 1 + daniel-nguyen: 1 + david-spickett: 2 + dawid-borycki: 30 + diego-russo: 1 + diego-russo-and-leandro-nunes: 1 + elham-harirpoush: 2 + florent-lebeau: 5 + "fr\xE9d\xE9ric--lefred--descamps": 2 + gabriel-peterson: 5 + gayathri-narayana-yegna-narayanan: 1 + graham-woodward: 1 + iago-calvo-lista,-arm: 1 + james-whitaker,-arm: 1 + jason-andrews: 89 + joe-stech: 1 + johanna-skinnider: 2 + jonathan-davies: 2 + jose-emilio-munoz-lopez,-arm: 1 + julie-gaskin: 4 + julio-suarez: 5 + kasper-mecklenburg: 1 + koki-mitsunami: 1 + konstantinos-margaritis: 7 + kristof-beyls: 1 + liliya-wu: 1 + mathias-brossard: 1 + michael-hall: 5 + nikhil-gupta,-pareena-verma,-nobel-chowdary-mandepudi,-ravi-malhotra: 1 + odin-shen: 1 + owen-wu,-arm: 2 + pareena-verma: 35 + pareena-verma,-annie-tallund: 1 + pareena-verma,-jason-andrews,-and-zach-lasiuk: 1 + pareena-verma,-joe-stech,-adnan-alsinan: 1 + pranay-bakre: 4 + przemyslaw-wirkus: 1 + rin-dobrescu: 1 + roberto-lopez-mendez: 2 + ronan-synnott: 45 + thirdai: 1 + tianyu-li: 1 + tom-pilar: 1 + uma-ramalingam: 1 + varun-chari,-pareena-verma: 1 + visualsilicon: 1 + ying-yu: 1 + ying-yu,-arm: 1 + zach-lasiuk: 1 + zhengjun-xing: 2 + issues: + avg_close_time_hrs: 0 + num_issues: 17 + percent_closed_vs_total: 0.0 diff --git a/hugo-server.sh b/hugo-server.sh index 23355bbe9e..7c3198adef 100755 --- a/hugo-server.sh +++ b/hugo-server.sh @@ -3,7 +3,7 @@ # ============================================================================= # Build the hugo site static html pages. # ----------------------------------------------------------------------------- -hugo +hugo --buildDrafts # ============================================================================= # Enable the home page search box. @@ -43,4 +43,4 @@ fi # ============================================================================= # Serve our local tree for interactive development. # ----------------------------------------------------------------------------- -hugo server +hugo server --buildDrafts diff --git a/tools/stats_data_generate.py b/tools/stats_data_generate.py index c5d45d8aee..72002a221c 100644 --- a/tools/stats_data_generate.py +++ b/tools/stats_data_generate.py @@ -158,7 +158,7 @@ def authorAdd(author_name,tracking_dic): ### Update 'contributions' area, internal vs external contributions # open the contributors CSV file - with open('../contributors.csv', mode ='r')as file: + with open('../assets/contributors.csv', mode ='r')as file: csvFile = csv.reader(file) for line in csvFile: company = line[1]