Skip to content

Commit bbb4355

Browse files
committed
Tech review of Streamline kernel module LP
1 parent c7cf14a commit bbb4355

File tree

9 files changed

+141
-93
lines changed

9 files changed

+141
-93
lines changed

content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/2_build_kernel_image.md

Lines changed: 76 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -6,66 +6,93 @@ weight: 3
66
layout: learningpathall
77
---
88

9+
## Install packages
10+
11+
```
12+
sudo apt update
13+
sudo apt install -y which sed make binutils build-essential diffutils gcc g++ bash patch gzip \
14+
bzip2 perl tar cpio unzip rsync file bc findutils gawk libncurses-dev python-is-python3 \
15+
gcc-arm-none-eabi
16+
```
17+
918
## Build a debuggable kernel image
1019

11-
For this learning path we will be using [Buildroot](https://github.com/buildroot/buildroot) to build a Linux image for Raspberry Pi 3B+ with a debuggable Linux kernel. We will profile Linux kernel modules built out-of-tree and Linux device drivers built in the Linux source code tree.
20+
For this learning path you will be using [Buildroot](https://github.com/buildroot/buildroot) to build a Linux image for Raspberry Pi 3B+ with a debuggable Linux kernel. You will profile Linux kernel modules built out-of-tree and Linux device drivers built in the Linux source code tree.
1221

1322
1. Clone the Buildroot Repository and initialize the build system with the default configurations.
1423

15-
```bash
16-
git clone https://github.com/buildroot/buildroot.git
17-
cd buildroot
18-
make raspberrypi3_64_defconfig
19-
make menuconfig
20-
make -j$(nproc)
21-
```
22-
23-
2. Change Buildroot configurations to enable debugging symbols and SSH access.
24-
25-
```plaintext
26-
Build options --->
27-
[*] build packages with debugging symbols
28-
gcc debug level (debug level 3)
29-
[*] build packages with runtime debugging info
30-
gcc optimization level (optimize for debugging) --->
31-
32-
System configuration --->
33-
[*] Enable root login with password
34-
(****) Root password # Choose root password here
35-
36-
Kernel --->
37-
Linux Kernel Tools --->
38-
[*] perf
24+
```bash
25+
git clone https://github.com/buildroot/buildroot.git
26+
cd buildroot
27+
export BUILDROOT_HOME=$(pwd)
28+
make raspberrypi3_64_defconfig
29+
```
30+
{{% notice Using a different board %}}
31+
If you're not using a Raspberry Pi 3 for this Learning Path, change the `raspberrypi3_64_defconfig` to the option that matches your hardware in `$(BUILDROOT_HOME)/configs`
32+
{{% /notice %}}
33+
34+
2. You will use `menuconfig` to configure the setup. Invoke it with the following command:
35+
36+
```
37+
make menuconfig
38+
```
39+
40+
![Menuconfig UI for Buildroot configuration](./images/menuconfig.png)
41+
42+
Change Buildroot configurations to enable debugging symbols and SSH access.
43+
44+
```plaintext
45+
Build options --->
46+
[*] build packages with debugging symbols
47+
gcc debug level (debug level 3)
48+
[*] build packages with runtime debugging info
49+
gcc optimization level (optimize for debugging) --->
50+
51+
System configuration --->
52+
[*] Enable root login with password
53+
(****) Root password # Choose root password here
3954
40-
Target packages --->
41-
Networking applications --->
42-
[*] openssh
43-
[*] server
44-
[*] key utilities
45-
```
55+
Kernel --->
56+
Linux Kernel Tools --->
57+
[*] perf
58+
59+
Target packages --->
60+
Networking applications --->
61+
[*] openssh
62+
[*] server
63+
[*] key utilities
64+
```
65+
66+
You might also need to change your default `sshd_config` file according to your network settings. To do that, you need to modify System configuration→ Root filesystem overlay directories to add a directory that contains your modified `sshd_config` file.
4667

47-
You might also need to change your default `sshd_config` file according to your network settings. To do that, you need to modify System configuration→ Root filesystem overlay directories to add a directory that contains your modified `sshd_config` file.
68+
3. By default the Linux kernel images are stripped. You will need to make the image debuggable as you'll be using it later.
4869

49-
3. By default the Linux kernel images are stripped so we will need to make the image debuggable as we'll be using it later.
70+
Invoke `linux-menuconfig` and uncheck the option as shown.
5071

51-
```bash
52-
make linux-menuconfig
53-
```
72+
```bash
73+
make linux-menuconfig
74+
```
5475

55-
```plaintext
56-
Kernel hacking --->
57-
-*- Kernel debugging
58-
Compile-time checks and compiler options --->
59-
Debug information (Rely on the toolchain's implicit default DWARF version)
60-
[ ] Reduce debugging information #un-check
61-
```
76+
```plaintext
77+
Kernel hacking --->
78+
-*- Kernel debugging
79+
Compile-time checks and compiler options --->
80+
Debug information (Rely on the toolchain's implicit default DWARF version)
81+
[ ] Reduce debugging information # un-check
82+
```
6283

63-
4. Now we can build the Linux image and flash it to the the SD card to run it on the Raspberry Pi.
84+
4. Now you can build the Linux image and flash it to the the SD card to run it on the Raspberry Pi.
6485

65-
```bash
66-
make -j$(nproc)
67-
```
86+
```bash
87+
make -j$(nproc)
88+
```
89+
90+
It will take some time to build the Linux image. When it completes, the output will be in `$BUILDROOT_HOME/output/images/sdcard.img`:
91+
92+
```bash
93+
ls $BUILDROOT_HOME/output/images/ | grep sdcard.img
94+
```
6895

69-
It will take some time to build the Linux image. When it completes, the output will be in `<buildroot dir>/output/images/sdcard.img`
7096
For details on flashing the SD card image, see [this helpful article](https://www.ev3dev.org/docs/tutorials/writing-sd-card-image-ubuntu-disk-image-writer/).
71-
Now that we have a target running Linux with a debuggable kernel image, we can start writing our kernel module that we want to profile.
97+
98+
Now that you have a target running Linux with a debuggable kernel image, you can start writing your kernel module that you want to profile.

content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/3_OOT_module.md

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,15 @@ layout: learningpathall
88

99
## Creating the Linux Kernel Module
1010

11-
We will now learn how to create an example Linux kernel module (Character device) that demonstrates a cache miss issue caused by traversing a 2D array in column-major order. This access pattern is not cache-friendly, as it skips over most of the neighboring elements in memory during each iteration.
11+
You will now create an example Linux kernel module (Character device) that demonstrates a cache miss issue caused by traversing a 2D array in column-major order. This access pattern is not cache-friendly, as it skips over most of the neighboring elements in memory during each iteration.
1212

13-
To build the Linux kernel module, start by creating a new directory—We will call it **example_module**—in any location of your choice. Inside this directory, add two files: `mychardrv.c` and `Makefile`.
13+
To build the Linux kernel module, start by creating a new directory, for example `example_module`. Inside this directory, add two files: `mychardrv.c` and `Makefile`.
1414

1515
**Makefile**
1616

1717
```makefile
1818
obj-m += mychardrv.o
19-
BUILDROOT_OUT := /opt/rpi-linux/buildroot/output # Change this to your buildroot output directory
19+
BUILDROOT_OUT := $(BUILDROOT_HOME)/output # Change this to your buildroot output directory
2020
KDIR := $(BUILDROOT_OUT)/build/linux-custom
2121
CROSS_COMPILE := $(BUILDROOT_OUT)/host/bin/aarch64-buildroot-linux-gnu-
2222
ARCH := arm64
@@ -29,7 +29,7 @@ clean:
2929
```
3030

3131
{{% notice Note %}}
32-
Change **BUILDROOT_OUT** to the correct buildroot output directory on your host machine
32+
Change **BUILDROOT_OUT** to the correct buildroot output directory on your host machine.
3333
{{% /notice %}}
3434

3535
**mychardrv.c**
@@ -201,40 +201,45 @@ MODULE_AUTHOR("Yahya Abouelseoud");
201201
MODULE_DESCRIPTION("A simple char driver with cache misses issue");
202202
```
203203
204-
The module above receives the size of a 2D array as a string through the `char_dev_write()` function, converts it to an integer, and passes it to the `char_dev_cache_traverse()` function. This function then creates the 2D array, initializes it with simple data, traverses it in a column-major (cache-unfriendly) order, computes the sum of its elements, and prints the result to the kernel log.
204+
The module above receives the size of a 2D array as a string through the `char_dev_write()` function, converts it to an integer, and passes it to the `char_dev_cache_traverse()` function. This function then creates the 2D array, initializes it with simple data, traverses it in a column-major (cache-unfriendly) order, computes the sum of its elements, and prints the result to the kernel log. The cache-unfriendly aspects allows you to inspect a bottleneck using Streamline in the next section.
205205
206206
## Building and Running the Kernel Module
207207
208208
1. To compile the kernel module, run make inside the example_module directory. This will generate the output file `mychardrv.ko`.
209209
210-
2. Transfer the .ko file to the target using scp command and then insert it using insmod command. After inserting the module, we create a character device node using mknod command. Finally, we can test the module by writing a size value (e.g., 10000) to the device file and measuring the time taken for the operation using the `time` command.
210+
2. Transfer the .ko file to the target using scp command and then insert it using insmod command. After inserting the module, you create a character device node using mknod command. Finally, you can test the module by writing a size value (e.g., 10000) to the device file and measuring the time taken for the operation using the `time` command.
211211
212212
```bash
213213
scp mychardrv.ko root@<target-ip>:/root/
214214
```
215215
216216
{{% notice Note %}}
217-
Replace \<target-ip> with your own target IP address
217+
Replace \<target-ip> with your target's IP address
218218
{{% /notice %}}
219219
220-
3. To run the module on the target, we need to run the following commands on the target:
220+
3. SSH onto your target device:
221221
222222
```bash
223223
ssh root@<your-target-ip>
224-
225-
#The following commands should be running on target device
226-
224+
```
225+
226+
4. Execute the following commads on the target to run the module:
227+
```bash
227228
insmod /root/mychardrv.ko
228229
mknod /dev/mychardrv c 42 0
229230
```
230231
231232
{{% notice Note %}}
232-
42 and 0 are the major and minor number we chose in our module code above
233+
42 and 0 are the major and minor number specified in the module code above
233234
{{% /notice %}}
234235
235-
4. Now if you run dmesg you should see something like:
236+
4. To verify that the module is active, run `dmesg` and the output should match the below:
237+
238+
```bash
239+
dmesg
240+
```
236241
237-
```log
242+
```output
238243
[12381.654983] mychardrv is open - Major(42) Minor(0)
239244
```
240245
@@ -249,4 +254,4 @@ The module above receives the size of a 2D array as a string through the `char_d
249254
250255
The command above passes 10000 to the module, which specifies the size of the 2D array to be created and traversed. The **echo** command takes a long time to complete (around 38 seconds) due to the cache-unfriendly traversal implemented in the `char_dev_cache_traverse()` function.
251256
252-
With the kernel module built, the next step is to profile it using Arm Streamline. We will use it to capture runtime behavior, highlight performance bottlenecks, and help identifying issues such as the cache-unfriendly traversal in our module.
257+
With the kernel module built, the next step is to profile it using Arm Streamline. You will use it to capture runtime behavior, highlight performance bottlenecks, and help identifying issues such as the cache-unfriendly traversal in your module.

content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/4_sl_profile_OOT.md

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,25 +10,33 @@ layout: learningpathall
1010

1111
Arm Streamline is a tool that uses sampling to measure system performance. Instead of recording every single event (like instrumentation does, which can slow things down), it takes snapshots of hardware counters and system registers at regular intervals. This gives a statistical view of how the system runs, while keeping the overhead small.
1212

13-
Streamline tracks many performance metrics such as CPU usage, execution cycles, memory access, cache hits and misses, and GPU activity. By putting this information together, it helps developers see how their code is using the hardware. Captured data is presented on a timeline, so you can see how performance changes as your program runs. This makes it easier to notice patterns, find bottlenecks, and link performance issues to specific parts of your application.
13+
Streamline tracks performance metrics such as CPU usage, execution cycles, memory access, cache hits and misses, and GPU activity. By putting this information together, it helps developers see how their code is using the hardware. Captured data is presented on a timeline, so you can see how performance changes as your program runs. This makes it easier to notice patterns, find bottlenecks, and link performance issues to specific parts of your application.
1414

1515
For more details about Streamline and its features, refer to the [Streamline user guide](https://developer.arm.com/documentation/101816/latest/Getting-started-with-Streamline/Introduction-to-Streamline).
1616

17-
Streamline is included with Arm Performance Studio, which you can download and use for free from [Arm Performance Studio downloads](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio#Downloads).
17+
### Download Streamline
18+
19+
Streamline is included with Arm Performance Studio, which you can download and use for free. Download it by following the link below.
20+
21+
[Arm Performance Studio downloads](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio#Downloads).
1822

1923
For step-by-step guidance on setting up Streamline on your host machine, follow the installation instructions provided in [Streamline installation guide](https://developer.arm.com/documentation/101816/latest/Getting-started-with-Streamline/Install-Streamline).
2024

2125
### Pushing Gator to the Target and Making a Capture
2226

23-
Once Streamline is installed on the host machine, you can capture trace data of our Linux kernel module.
27+
Once Streamline is installed on the host machine, you can capture trace data of our Linux kernel module. On Linux, the binaries will be installed where you extracted the package.
2428

2529
1. To communicate with the target, Streamline requires a daemon, called **gatord**, to be installed and running on the target. gatord must be running before you can capture trace data. There are two pre-built gatord binaries available in Streamline's install directory, one for *Armv7 (AArch32)* and one for *Armv8 or later(AArch64)*. Push **gatord** to the target device using **scp**.
2630

2731
```bash
2832
scp <install_directory>/streamline/bin/linux/arm64/gatord root@<target-ip>:/root/gatord
29-
# use arm instead of arm64, if your are using an AArch32 target
3033
```
3134

35+
{{% notice Note %}}
36+
If you are using an AArch32 target, use `arm` instead of `arm64`.
37+
{{% /notice%}}
38+
39+
3240
2. Run gator on the target to start system-wide capture mode.
3341

3442
```bash
@@ -42,25 +50,27 @@ Once Streamline is installed on the host machine, you can capture trace data of
4250
4. Enter your target hostname or IP address.
4351
![Streamline TCP settings#center](./images/img02_streamline_tcp.png)
4452

45-
5. Click on *Select counters* to open the counter configuration dialogue, to learn more about counters and how to configure them please refer to [counter configuration guide](https://developer.arm.com/documentation/101816/latest/Capture-a-Streamline-profile/Counter-Configuration)
53+
5. Click on *Select counters* to open the counter configuration dialogue.
4654

4755
6. Add `L1 data Cache: Refill` and `L1 Data Cache: Access` and enable Event-Based Sampling (EBS) for both of them as shown in the screenshot and click *Save*.
4856

49-
{{% notice %}}
57+
{{% notice Further reading %}}
58+
To learn more about counters and how to configure them please refer to [counter configuration guide](https://developer.arm.com/documentation/101816/latest/Capture-a-Streamline-profile/Counter-Configuration)
59+
5060
To learn more about EBS, please refer to [Streamline user guide](https://developer.arm.com/documentation/101816/9-7/Capture-a-Streamline-profile/Counter-Configuration/Setting-up-event-based-sampling)
5161
{{% /notice %}}
5262

5363
![Counter configuration#center](./images/img03_counter_config.png)
5464

55-
7. In the Command section, we will add the same shell command we used earlier to test our Linux module.
65+
7. In the Command section, add the same shell command you used earlier to test our Linux module.
5666

5767
```bash
5868
sh -c "echo 10000 > /dev/mychardrv"
5969
```
6070

6171
![Streamline command#center](./images/img04_streamline_cmd.png)
6272

63-
8. In the Capture settings dialog, select Add image, add your kernel module file `mychardrv.ko` and click Save.
73+
8. In the Capture settings dialog, select Add image, add the absolut path of your kernel module file `mychardrv.ko` and click Save.
6474
![Capture settings#center](./images/img05_capture_settings.png)
6575

6676
9. Start the capture and enter a name and location for the capture file. Streamline will start collecting data and the charts will show activity being captured from the target.
@@ -70,21 +80,21 @@ Once Streamline is installed on the host machine, you can capture trace data of
7080

7181
Once the capture is stopped, Streamline automatically analyzes the collected data and provides insights to help identify performance issues and bottlenecks. This section describes how to view these insights, starting with locating the functions related to our kernel module and narrowing down to the exact lines of code that may be responsible for the performance problems.
7282

73-
1. Open the *Functions tab*. In the counters list, select one of the counters we selected earlier in the counter configuration dialog, as shown:
83+
1. Open the *Functions tab*. In the counters list, select one of the counters you selected earlier in the counter configuration dialog, as shown:
7484

7585
![Counter selection#center](./images/img07_select_datasource.png)
7686

77-
2. In the Functions tab, observe that the function `char_dev_cache_traverse()` has the highest L1 Cache refill rate, which we already expected.
87+
2. In the Functions tab, observe that the function `char_dev_cache_traverse()` has the highest L1 Cache refill rate, which is expected.
7888
Also notice the Image name on the right, which is our module file name `mychardrv.ko`:
7989

8090
![Functions tab#center](./images/img08_Functions_Tab.png)
8191

8292
3. To view the call path of this function, right click on the function name and choose *Select in Call Paths*.
8393

84-
4. You can now see the exact function that called `char_dev_cache_traverse()`. In the Locations column, notice that the function calls started in the userspace (echo command) and terminated in the kernel space module `mychardrv.ko`:
94+
4. You can now see the exact function that called `char_dev_cache_traverse()`. In the Locations column, notice that the function calls started in the userspace (`echo` command) and terminated in the kernel space module `mychardrv.ko`:
8595
![Call paths tab#center](./images/img09_callpaths_tab.png)
8696
87-
5. Since we compiled our kernel module with debug info, we will be able to see the exact code lines that are causing these cache misses.
97+
5. Since you compiled the kernel module with debug info, you will be able to see the exact code lines that are causing these cache misses.
8898
To do so, double-click on the function name and the *Code tab* opens. This view shows you how much each code line contributed to the cache misses and in bottom half of the code view, you can also see the disassembly of these lines with the counter values of each assembly instruction:
8999
![Code tab#center](./images/img10_code_tab.png)
90100

0 commit comments

Comments
 (0)