Skip to content

Commit 4a96c91

Browse files
Merge pull request #2384 from jasonrandrews/review
Complete llama.cpp Streamline technical review
2 parents d9b191c + d2eb1a4 commit 4a96c91

File tree

5 files changed

+39
-38
lines changed

5 files changed

+39
-38
lines changed

content/learning-paths/servers-and-cloud-computing/llama_cpp_streamline/1_overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Frameworks such as [**llama.cpp**](https://github.com/ggml-org/llama.cpp), provi
1414

1515
To analyze their execution and use profiling insights for optimization, you need both a basic understanding of transformer architectures and the right analysis tools.
1616

17-
This Learning Path demonstrates how to use `llama-cli` application from llama.cpp together with Arm Streamline to analyze the efficiency of LLM inference on Arm CPUs.
17+
This Learning Path demonstrates how to use `llama-cli` from the command line together with Arm Streamline to analyze the efficiency of LLM inference on Arm CPUs.
1818

1919
You will learn how to:
2020
- Profile token generation at the Prefill and Decode stages
@@ -23,4 +23,4 @@ You will learn how to:
2323

2424
You will run the `Qwen1_5-0_5b-chat-q4_0.gguf` model using `llama-cli` on Arm Linux and use Streamline for analysis.
2525

26-
The same method can also be applied to Android platforms.
26+
The same method can also be used on Android.

content/learning-paths/servers-and-cloud-computing/llama_cpp_streamline/2_llama.cpp_intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,4 +83,4 @@ At the Decode stage, by utilizing the [KV cache](https://huggingface.co/blog/not
8383

8484
In summary, Prefill is compute-bound, dominated by large GEMM operations and Decode is memory-bound, dominated by KV cache access and GEMV operations.
8585

86-
You will see this highlighted during the analysis with Streamline.
86+
You will see this highlighted during the Streamline performance analysis.

content/learning-paths/servers-and-cloud-computing/llama_cpp_streamline/3_llama.cpp_annotation.md

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -20,36 +20,33 @@ You can either build natively on an Arm platform, or cross-compile on another ar
2020

2121
### Step 1: Build Streamline Annotation library
2222

23-
Install [Arm DS](https://developer.arm.com/Tools%20and%20Software/Arm%20Development%20Studio) or [Arm Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer) on your development machine first.
23+
Download and install [Arm Performance Studio](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio#Downloads) on your development machine.
2424

25-
Streamline Annotation support code is in the installation directory such as `Arm/Development Studio 2024.1/sw/streamline/gator/annotate`.
26-
27-
For installation guidance, refer to the [Streamline installation guide](/install-guides/streamline/).
28-
29-
Clone the gator repository that matches your Streamline version and build the `Annotation support library`.
25+
{{% notice Note %}}
26+
You can also download and install [Arm Development Studio](https://developer.arm.com/Tools%20and%20Software/Arm%20Development%20Studio#Downloads), as it also includes Streamline.
3027

31-
The installation step depends on your development machine.
28+
{{% /notice %}}
3229

33-
For Arm native build, you can use the following instructions to install the packages.
30+
Streamline Annotation support code is in the Arm Performance Studio installation directory in the `streamline/gator/annotate` directory.
3431

35-
For other machines, you need to set up the cross compiler environment by installing [Arm GNU toolchain](https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads).
32+
Clone the gator repository that matches your Streamline version and build the `Annotation support library`. You can build it on your current machine using the native build instructions and you can cross compile it for another Arm computer using the cross compile instructions.
3633

37-
You can refer to the [GCC install guide](https://learn.arm.com/install-guides/gcc/cross/) for cross-compiler installation.
34+
If you need to set up a cross compiler you can review the [GCC install guide](/install-guides/gcc/cross/).
3835

3936
{{< tabpane code=true >}}
4037
{{< tab header="Arm Native Build" language="bash">}}
41-
apt-get update
42-
apt-get install ninja-build cmake gcc g++ g++-aarch64-linux-gnu curl zip unzip tar pkg-config git
38+
sudo apt-get update
39+
sudo apt-get install -y ninja-build cmake gcc g++ g++-aarch64-linux-gnu curl zip unzip tar pkg-config git
4340
cd ~
4441
git clone https://github.com/ARM-software/gator.git
4542
cd gator
4643
./build-linux.sh
4744
cd annotate
4845
make
4946
{{< /tab >}}
50-
{{< tab header="Cross Compiler" language="bash">}}
51-
apt-get update
52-
apt-get install ninja-build cmake gcc g++ g++-aarch64-linux-gnu curl zip unzip tar pkg-config git
47+
{{< tab header="Cross Compile" language="bash">}}
48+
sudo apt-get update
49+
sudo apt-get install ninja-build cmake gcc g++ g++-aarch64-linux-gnu curl zip unzip tar pkg-config git
5350
cd ~
5451
git clone https://github.com/ARM-software/gator.git
5552
cd gator
@@ -79,29 +76,31 @@ mkdir streamline_annotation
7976
cp ~/gator/annotate/libstreamline_annotate.a ~/gator/annotate/streamline_annotate.h streamline_annotation
8077
```
8178

82-
To link the `libstreamline_annotate.a` library when building llama-cli, add the following lines at the end of `llama.cpp/tools/main/CMakeLists.txt`.
79+
To link the `libstreamline_annotate.a` library when building llama-cli, use an editor to add the following lines at the end of `llama.cpp/tools/main/CMakeLists.txt`.
8380

8481
```makefile
8582
set(STREAMLINE_LIB_PATH "${CMAKE_SOURCE_DIR}/streamline_annotation/libstreamline_annotate.a")
8683
target_include_directories(llama-cli PRIVATE "${CMAKE_SOURCE_DIR}/streamline_annotation")
8784
target_link_libraries(llama-cli PRIVATE "${STREAMLINE_LIB_PATH}")
8885
```
8986

90-
To add Annotation Markers to `llama-cli`, change the `llama-cli` code in `llama.cpp/tools/main/main.cpp` by adding the include file:
87+
To add Annotation Markers to `llama-cli`, edit the file `llama.cpp/tools/main/main.cpp` and make 3 modification.
88+
89+
First, add the include file at the top of `main.cpp` with the other include files.
9190

9291
```c
9392
#include "streamline_annotate.h"
9493
```
9594

96-
After the call to `common_init()`, add the setup macro:
95+
Next, the find the `common_init()` call in the `main()` function and add the Streamline setup macro below it so that the code looks like:
9796

9897
```c
9998
common_init();
10099
//Add the Annotation setup code
101100
ANNOTATE_SETUP;
102101
```
103102

104-
Finally, add an annotation marker inside the main loop:
103+
Finally, add an annotation marker inside the main loop. Add the complete code instead the annotation comments so it looks like:
105104

106105
```c
107106
for (int i = 0; i < (int) embd.size(); i += params.n_batch) {
@@ -150,8 +149,8 @@ Next, configure the project.
150149
-DBUILD_SHARED_LIBS=OFF \
151150
-DCMAKE_EXE_LINKER_FLAGS="-static -g" \
152151
-DGGML_OPENMP=OFF \
153-
-DCMAKE_C_FLAGS="-march=armv8.2-a+dotprod+i8mm -g" \
154-
-DCMAKE_CXX_FLAGS="-march=armv8.2-a+dotprod+i8mm -g" \
152+
-DCMAKE_C_FLAGS="-march=native -g" \
153+
-DCMAKE_CXX_FLAGS="-march=native -g" \
155154
-DGGML_CPU_KLEIDIAI=ON \
156155
-DLLAMA_BUILD_TESTS=OFF \
157156
-DLLAMA_BUILD_EXAMPLES=ON \
@@ -161,8 +160,8 @@ Next, configure the project.
161160
cmake .. \
162161
-DCMAKE_SYSTEM_NAME=Linux \
163162
-DCMAKE_SYSTEM_PROCESSOR=arm \
164-
-DCMAKE_C_COMPILER=aarch64-none-linux-gnu-gcc \
165-
-DCMAKE_CXX_COMPILER=aarch64-none-linux-gnu-g++ \
163+
-DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
164+
-DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
166165
-DLLAMA_NATIVE=OFF \
167166
-DLLAMA_F16C=OFF \
168167
-DLLAMA_GEMM_ARM=ON \
@@ -190,7 +189,7 @@ Now you can build the project using `cmake`:
190189

191190
```bash
192191
cd ~/llama.cpp/build
193-
cmake --build ./ --config Release
192+
cmake --build ./ --config Release -j $(nproc)
194193
```
195194

196195
After the building process completes, you can find the `llama-cli` in the `~/llama.cpp/build/bin/` directory.

content/learning-paths/servers-and-cloud-computing/llama_cpp_streamline/4_analyze_token_prefill_decode.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,24 +8,25 @@ layout: learningpathall
88

99
## Run llama-cli and analyze the data with Streamline
1010

11-
After successfully building llama-cli, the next step is to set up the runtime environment on your Arm platform.
11+
After successfully building llama-cli, the next step is to set up the runtime environment on your Arm platform. This can be your development machine or another Arm system.
1212

13-
### Set up gatord
13+
### Set up the gator daemon
1414

15-
The gator daemon (gatord) is the Streamline collection agent that runs on the target device. It captures performance data including CPU metrics, PMU events, and annotations, then sends this data to the Streamline analysis tool running on your host machine. The daemon needs to be running on your target device before you can capture performance data.
15+
The gator daemon, `gatord`, is the Streamline collection agent that runs on the target device. It captures performance data including CPU metrics, PMU events, and annotations, then sends this data to the Streamline analysis tool running on your host machine. The daemon needs to be running on your target device before you can capture performance data.
1616

1717
Depending on how you built llama.cpp:
1818

1919
For the cross-compiled build flow:
2020

2121
- Copy the `llama-cli` executable to your Arm target.
22-
- Also copy the `gatord` binary from the Arm DS or Streamline installation:
23-
- Linux: `Arm\Development Studio 2024.1\sw\streamline\bin\linux\arm64`
24-
- Android: `Arm\Development Studio 2024.1\sw\streamline\bin\android\arm64`
22+
- Copy the `gatord` binary from the Arm Performance Studio release. If you are targeting Linux, take it from `streamline\bin\linux\arm64` and if you are targeting Android take it from `streamline\bin\android\arm64`.
23+
24+
Put both of these programs in your home directory on the target system.
2525

2626
For the native build flow:
27+
- Use the `llama-cli` from your local build in `llama.cpp/build/bin` and the `gatord` you compiled earlier at `~/gator/build-native-gcc-rel/gatord`.
2728

28-
- Use the `llama-cli` from your local build and the `gatord` you compiled earlier (`~/gator/build-native-gcc-rel/gatord`).
29+
You now have the `gatord` and the `llama-cli` on the computer you want to run and profile.
2930

3031
### Download a lightweight model
3132

@@ -49,8 +50,9 @@ Start the gator daemon on your Arm target:
4950
You should see similar messages to those shown below:
5051

5152
``` bash
52-
Streamline Data Recorder v9.4.0 (Build 9b1e8f8)
53-
Copyright (c) 2010-2024 Arm Limited. All rights reserved.
53+
Streamline Data Recorder v9.6.0 (Build oss)
54+
Copyright (c) 2010-2025 Arm Limited. All rights reserved.
55+
5456
Gator ready
5557
```
5658

content/learning-paths/servers-and-cloud-computing/llama_cpp_streamline/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ learning_objectives:
1919
prerequisites:
2020
- Basic understanding of llama.cpp
2121
- Understanding of transformer models
22-
- Knowledge of Streamline usage
23-
- An Arm Neoverse or Cortex-A hardware platform running Linux or Android to test the application
22+
- Knowledge of Arm Streamline usage
23+
- An Arm Neoverse or Cortex-A hardware platform running Linux or Android
2424

2525
author:
2626
- Zenon Zhilong Xiu

0 commit comments

Comments
 (0)