diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/1-prerequisites.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/1-prerequisites.md new file mode 100644 index 0000000000..31ef3bc3e8 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/1-prerequisites.md @@ -0,0 +1,27 @@ +--- +title: Prerequisites +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Install software required for this Learning Path + +In this learning path, you will compile an Android application, so you first need to download and install the latest version of [Android Studio](https://developer.android.com/studio) on your computer. + +You then need to ensure you have the following tools: +- `cmake`, the software build system +- `git`, the version control system for cloning the Voice Assistant codebase +- `adb`, the Android Debug Bridge, a command-line tool to communicate with a device and perform various commands on it + +These tools can be installed by running the following command (depending on your machine's OS): + +{{< tabpane code=true >}} + {{< tab header="Linux/Ubuntu" language="bash">}} +sudo apt install git adb cmake + {{< /tab >}} + {{< tab header="macOS" language="bash">}} +brew install git android-platform-tools cmake + {{< /tab >}} +{{< /tabpane >}} \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/2-overview.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/2-overview.md new file mode 100644 index 0000000000..52250e8f09 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/2-overview.md @@ -0,0 +1,40 @@ +--- +title: Overview +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +The Voice Assistant is an example application that demonstrates a complete voice interaction pipeline for Android. + +It generates intelligent responses by utilizing: +1. Speech-to-Text (STT) to transform the user's audio prompt into a text representation, +2. a Large Language Model (LLM) to answer the user's prompt in text form, +3. the Android Text-to-Speech (TTS) API is then used to produce a voice response. + +![example image alt-text#center](overview.png "Figure 1: Overview") + +These three steps correspond to specific components used in the Voice Assistant application. A more detailed description of each one follows. + +## Speech to Text Library + +Speech-to-Text is also known as Automatic Speech Recognition. This part of the pipeline focuses on converting spoken language into written text. + +Speech recognition is done in the following stages: +- The device's microphone captures spoken language as an audio waveform, +- The audio waveform is broken into small time frames, and features are extracted to represent sound, +- A neural network is used to predict the most likely transcription of audio based on grammar and context, +- The final recognized text is generated for the next stage of the pipeline. + +## Large Language Models Library + +Large Language Models (LLMs) are designed for natural language understanding, and in this application, they are used for question-answering. + +The text transcription from the previous part of the pipeline is used as input to the neural model. During initialization, the application assigns a persona to the LLM to ensure a friendly and informative voice assistant experience. By default, the application uses an asynchronous flow for this part of the pipeline, meaning that parts of the response are collected as they become available. The application UI is updated with each new token, and these are also used for the final stage of the pipeline. + +## Text to Speech Component + +Currently, this part of the application pipeline uses the Android Text-to-Speech API with some extra functionality to ensure smooth and natural speech output. + +In synchronous mode, speech is only generated after the full response from the LLM is received. By default, the application operates in asynchronous mode, where speech synthesis starts as soon as a sufficient portion of the response (such as a half or full sentence) is available. Any additional responses are queued for processing by the Android Text-to-Speech engine. \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/3-build.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/3-build.md new file mode 100644 index 0000000000..6ef5807de3 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/3-build.md @@ -0,0 +1,29 @@ +--- +title: Build the Voice Assistant +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Download the Voice Assistant + +```BASH +git clone https://git.gitlab.arm.com/kleidi/kleidi-examples/real-time-voice-assistant.git voice-assistant.git +``` + +## Build the Voice Assistant + +Open Android Studio and open the project that you just downloaded in the preceding step: + +![example image alt-text#center](open_project.png "Figure 2: Open the project in Android Studio.") + +Build the application with its default settings by clicking the little hammer +"Make Module 'VoiceAssistant.app'" button in the upper right corner: + +![example image alt-text#center](build_project.png "Figure 3: Build the project.") + +Android Studio will start the build, which may take some time if it needs to +download some dependencies of the Voice Assistant app: + +![example image alt-text#center](build_success.png "Figure 4: Successful build!") \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/4-run.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/4-run.md new file mode 100644 index 0000000000..9c2c42dfb0 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/4-run.md @@ -0,0 +1,49 @@ +--- + +title: Run the Voice Assistant +weight: 6 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +In the previous section, we built the Voice Assistant app. Now, we need to install it on the phone. The easiest way to do this is to put the Android phone in developer mode and use a USB cable to upload the application. + +## Switch your phone to developer mode + +By default, developer mode is not active on Android phones. You will need to activate it by following [these instructions](https://developer.android.com/studio/debug/dev-options). + +## Upload the Voice Assistant to your phone + +Once your phone is in developer mode, plug it into the USB cable: it should appear as a running device in the top bar. Select it and then press the run button (small red circle in figure 4 below). This will transfer the app to the phone and launch it. + +In the picture below, a Pixel 6a phone has been connected to the USB cable: +![example image alt-text#center](upload.png "Figure 5: Upload the Voice App") + +## Run the Voice Assistant + +The Voice Assistant will welcome you with this screen: + +![example image alt-text#center](voice_assistant_view1.png "Figure 6: Welcome Screen") + +You can now press the part at the bottom and make your request! + +## Voice Assistant Controls + +### Performance Counters + +You can switch on/off the display of some performance counters like: +- Speech recognition time +- LLM encode tokens/s +- LLM decode tokens/s +- Speech generation time + +by clicking on the element circled in red in the upper left: + +![example image alt-text#center](voice_assistant_view2.png "Figure 7: Performance Counters") + +### Reset the Voice Assistant's Context + +By clicking on the icon circled in red in the upper right corner, you can reset the assistant's context. + +![example image alt-text#center](voice_assistant_view3.png "Figure 8: Reset the Voice Assistant's Context") diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/5-kleidiai.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/5-kleidiai.md new file mode 100644 index 0000000000..b3e35a2716 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/5-kleidiai.md @@ -0,0 +1,37 @@ +--- + +title: KleidiAI + +weight: 7 + +### FIXED, DO NOT MODIFY + +layout: learningpathall + +--- + +The LLM part of the Voice Assistant uses [Llama.cpp](https://github.com/ggml-org/llama.cpp). LLM inference is a highly computation-intensive task and has been heavily optimized within Llama.cpp for various platforms, including Arm. + +Speech recognition is also a computation-intensive task and has been optimized for Arm processors as well. + +## KleidiAI + +This application uses the [KleidiAI library](https://gitlab.arm.com/kleidi/kleidiai) by default for optimized performance on Arm processors. + +[KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) is an open-source library that provides optimized performance-critical routines, also known as micro-kernels, for artificial intelligence (AI) workloads tailored for Arm CPUs. + +These routines are tuned to exploit the capabilities of specific Arm hardware architectures, aiming to maximize performance. + +The KleidiAI library has been designed for easy adoption into C or C++ machine learning (ML) and AI frameworks. Developers looking to incorporate specific micro-kernels into their projects can simply include the corresponding `.c` and `.h` files associated with those micro-kernels and a common header file. + +### Compare the performance without KleidiAI + +By default, the Voice Assistant is built with KleidiAI support on Arm platforms, but this can be disabled if you want to compare the performance to a raw implementation. + +You can disable KleidiAI support at build time in Android Studio by adding `-PkleidiAI=false` to the Gradle invocation. You can also edit the top-level `gradle.properties` file and add `kleidiAI=false` at the end of it. + +### Why use KleidiAI? + +A significant benefit of using KleidiAI is that it enables the developer to work at a relatively high level, leaving the KleidiAI library to select the best implementation at runtime to perform the computation in the most efficient way on the current target. This is a great advantage because a significant amount of work has gone into optimizing those micro-kernels. + +It becomes even more powerful when newer versions of the architecture become available: a simple update of the KleidiAI library used by the Voice Assistant will automatically give it access to newer hardware features as they become available. An example of such a feature deployment is happening with SME2, which means in the near future, the Voice Assistant will be able to benefit from improved performance — on devices that have implemented SME2 — with no further effort required from the developer. \ No newline at end of file diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_index.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_index.md new file mode 100644 index 0000000000..ba52fc8fec --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_index.md @@ -0,0 +1,61 @@ +--- +title: Accelerate a Voice Assistant with KleidiAI and SME2 + +minutes_to_complete: 30 + +who_is_this_for: This Learning Path is an introductory topic on improving the performances of a voice assistant by using KleidiAI and SME2. + +learning_objectives: + - Compile an Android application + - Use KleidAI and SME2 to improve the performance of the voice assistant + +prerequisites: + - an Android phone + - Android Studio + - CMake + - adb + - git + +author: Arnaud de Grandmaison + +test_images: + - ubuntu:latest +test_link: null +test_maintenance: true + +### Tags +skilllevels: Introductory +subjects: Performance and Architecture +armips: + - Cortex-A +tools_software_languages: + - Java + - Kotlin +operatingsystems: + - Linux + - macOS + - Windows + +further_reading: + + - resource: + title: Accelerate Generative AI workloads using KleidiAI + link: https://learn.arm.com/learning-paths/cross-platform/kleidiai-explainer + type: website + + - resource: + title: LLM inference on Android with KleidiAI, MediaPipe, and XNNPACK + link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/kleidiai-on-android-with-mediapipe-and-xnnpack/ + type: website + + - resource: + title: Vision LLM inference on Android with KleidiAI and MNN + link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/ + type: website + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_next-steps.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_next-steps.md new file mode 100644 index 0000000000..921f569dd7 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +weight: 21 # set to always be larger than the content in this path, and one more than 'review' +title: "Next Steps" # Always the same +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_review.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_review.md new file mode 100644 index 0000000000..3d31efd51d --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_review.md @@ -0,0 +1,35 @@ +--- +review: + - questions: + question: > + What is KleidiAI? + answers: + - An anime about a little AI lost in a giant world. + - A software library + correct_answer: 2 + explanation: > + KleidiAI is an open-source software library that provides optimized + performance-critical micro-kernels for artificial intelligence (AI) + workloads tailored for Arm processors. + + - questions: + question: > + How does KleidiAI optimize performance? + answers: + - Lots of magic, and let's be honest, a bit of hard work + - It takes advantage of different available Arm processor architectural features. + correct_answer: 2 + explanation: > + Processor architectural features, e.g., ``FEAT_DotProd``, when implemented, enable + the software to use specific instructions dedicated to efficiently performing some + tasks or computations. For example, when implemented, ``FEAT_DotProd`` adds the + ``UDOT`` and ``SDOT`` 8-bit dot product instructions, which are critical for + improving the performance of dot product computations. + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +title: "Review" # Always the same title +weight: 20 # Set to always be larger than the content in this path +layout: "learningpathall" # All files under learning paths have this same wrapper +--- diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/build_project.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/build_project.png new file mode 100644 index 0000000000..9ba23467dd Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/build_project.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/build_success.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/build_success.png new file mode 100644 index 0000000000..5614c75cdd Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/build_success.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/open_project.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/open_project.png new file mode 100644 index 0000000000..f8fe7ec9d6 Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/open_project.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/overview.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/overview.png new file mode 100644 index 0000000000..71e236ed40 Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/overview.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/upload.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/upload.png new file mode 100644 index 0000000000..a37a7010fe Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/upload.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view1.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view1.png new file mode 100644 index 0000000000..71f65dbc88 Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view1.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view2.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view2.png new file mode 100644 index 0000000000..391acedff8 Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view2.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view3.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view3.png new file mode 100644 index 0000000000..7811642490 Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view3.png differ diff --git a/data/stats_current_test_info.yml b/data/stats_current_test_info.yml index ff89188386..ab43f5474e 100644 --- a/data/stats_current_test_info.yml +++ b/data/stats_current_test_info.yml @@ -177,7 +177,11 @@ sw_categories: tests_and_status: [] iot: {} laptops-and-desktops: {} - mobile-graphics-and-gaming: {} + mobile-graphics-and-gaming: + voice-assistant: + readable_title: Accelerate a Voice Assistant with KleidiAI and SME2 + tests_and_status: + - ubuntu:latest: passed servers-and-cloud-computing: clickhouse: readable_title: Measure performance of ClickHouse on Arm servers