Skip to content

Commit 2986615

Browse files
Merge branch 'ArmDeveloperEcosystem:main' into review-visualizing-ethos-u-performance
2 parents 720ccc8 + cb69cdc commit 2986615

File tree

12 files changed

+192
-57
lines changed

12 files changed

+192
-57
lines changed

.wordlist.txt

Lines changed: 114 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4474,4 +4474,117 @@ AssetLib
44744474
PerformanceStudio
44754475
VkThread
44764476
precompiled
4477-
rollouts
4477+
rollouts
4478+
Bhusari
4479+
DLLAMA
4480+
FlameGraph
4481+
FlameGraphs
4482+
JSP
4483+
KBC
4484+
MMIO
4485+
Paravirtualized
4486+
PreserveFramePointer
4487+
Servlet
4488+
TDISP
4489+
VirtIO
4490+
WebSocket
4491+
agentpath
4492+
alarmtimer
4493+
aoss
4494+
apb
4495+
ata
4496+
bpf
4497+
brendangregg
4498+
chipidea
4499+
clk
4500+
cma
4501+
counterintuitive
4502+
cpuhp
4503+
cros
4504+
csd
4505+
devfreq
4506+
devlink
4507+
dma
4508+
dpaa
4509+
dwc
4510+
ecurity
4511+
edma
4512+
evice
4513+
filelock
4514+
filemap
4515+
flamegraphs
4516+
fsl
4517+
glink
4518+
gpu
4519+
hcd
4520+
hns
4521+
hw
4522+
hwmon
4523+
icmp
4524+
initcall
4525+
iomap
4526+
iommu
4527+
ipi
4528+
irq
4529+
jbd
4530+
jvmti
4531+
kmem
4532+
ksm
4533+
kvm
4534+
kyber
4535+
libata
4536+
libperf
4537+
lockd
4538+
mdio
4539+
memcg
4540+
mmc
4541+
mtu
4542+
musb
4543+
napi
4544+
ncryption
4545+
netfs
4546+
netlink
4547+
nfs
4548+
ntegrity
4549+
nterface
4550+
oom
4551+
optee
4552+
pagemap
4553+
paravirtualized
4554+
percpu
4555+
printk
4556+
pwm
4557+
qcom
4558+
qdisc
4559+
ras
4560+
rcu
4561+
regmap
4562+
rgerganov’s
4563+
rotocol
4564+
rpcgss
4565+
rpmh
4566+
rseq
4567+
rtc
4568+
sched
4569+
scmi
4570+
scsi
4571+
skb
4572+
smbus
4573+
smp
4574+
spi
4575+
spmi
4576+
sunrpc
4577+
swiotlb
4578+
tegra
4579+
thp
4580+
tlb
4581+
udp
4582+
ufs
4583+
untrusted
4584+
uring
4585+
virtio
4586+
vmalloc
4587+
vmscan
4588+
workqueue
4589+
xdp
4590+
xhci

content/learning-paths/servers-and-cloud-computing/_index.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ key_ip:
88
maintopic: true
99
operatingsystems_filter:
1010
- Android: 2
11-
- Linux: 154
12-
- macOS: 10
11+
- Linux: 157
12+
- macOS: 11
1313
- Windows: 14
1414
pinned_modules:
1515
- module:
@@ -22,8 +22,8 @@ subjects_filter:
2222
- Containers and Virtualization: 29
2323
- Databases: 15
2424
- Libraries: 9
25-
- ML: 28
26-
- Performance and Architecture: 60
25+
- ML: 29
26+
- Performance and Architecture: 62
2727
- Storage: 1
2828
- Web: 10
2929
subtitle: Optimize cloud native apps on Arm for performance and cost
@@ -47,6 +47,8 @@ tools_software_languages_filter:
4747
- ASP.NET Core: 2
4848
- Assembly: 4
4949
- assembly: 1
50+
- Async-profiler: 1
51+
- AWS: 1
5052
- AWS CDK: 2
5153
- AWS CodeBuild: 1
5254
- AWS EC2: 2
@@ -65,7 +67,7 @@ tools_software_languages_filter:
6567
- C++: 8
6668
- C/C++: 2
6769
- Capstone: 1
68-
- CCA: 6
70+
- CCA: 7
6971
- Clair: 1
7072
- Clang: 10
7173
- ClickBench: 1
@@ -77,18 +79,19 @@ tools_software_languages_filter:
7779
- Daytona: 1
7880
- Demo: 3
7981
- Django: 1
80-
- Docker: 17
82+
- Docker: 18
8183
- Envoy: 2
8284
- ExecuTorch: 1
8385
- FAISS: 1
86+
- FlameGraph: 1
8487
- Flink: 1
8588
- Fortran: 1
8689
- FunASR: 1
8790
- FVP: 4
8891
- GCC: 22
8992
- gdb: 1
9093
- Geekbench: 1
91-
- GenAI: 11
94+
- GenAI: 12
9295
- GitHub: 6
9396
- GitLab: 1
9497
- Glibc: 1
@@ -114,7 +117,7 @@ tools_software_languages_filter:
114117
- Linaro Forge: 1
115118
- Litmus7: 1
116119
- Llama.cpp: 1
117-
- LLM: 9
120+
- LLM: 10
118121
- llvm-mca: 1
119122
- LSE: 1
120123
- MariaDB: 1
@@ -132,6 +135,7 @@ tools_software_languages_filter:
132135
- Ollama: 1
133136
- ONNX Runtime: 1
134137
- OpenBLAS: 1
138+
- OpenJDK-21: 1
135139
- OpenShift: 1
136140
- OrchardCore: 1
137141
- PAPI: 1
@@ -144,7 +148,7 @@ tools_software_languages_filter:
144148
- RAG: 1
145149
- Redis: 3
146150
- Remote.It: 2
147-
- RME: 6
151+
- RME: 7
148152
- Runbook: 71
149153
- Rust: 2
150154
- snappy: 1
@@ -161,6 +165,7 @@ tools_software_languages_filter:
161165
- TensorFlow: 2
162166
- Terraform: 11
163167
- ThirdAI: 1
168+
- Tomcat: 1
164169
- Trusted Firmware: 1
165170
- TSan: 1
166171
- TypeScript: 1
@@ -173,6 +178,7 @@ tools_software_languages_filter:
173178
- Whisper: 1
174179
- WindowsPerf: 1
175180
- WordPress: 3
181+
- wrk2: 1
176182
- x265: 1
177183
- zlib: 1
178184
- Zookeeper: 1

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
In this step, you'll set up the Graviton4 instance with the tools and dependencies required to build and run the Arcee Foundation Model. This includes installing system packages and a Python environment.
9+
In this step, you'll set up the Graviton4 instance with the tools and dependencies required to build and run the AFM-4.5B model. This includes installing system packages and a Python environment.
1010

1111
## Update the package list
1212

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ layout: learningpathall
77
---
88
## Build the Llama.cpp inference engine
99

10-
In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms,including Arm-based processors like AWS Graviton4.
10+
In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms, including Arm-based processors like AWS Graviton4.
1111

1212
Even though AFM-4.5B uses a custom model architecture, you can still use the standard Llama.cpp repository - Arcee AI has contributed the necessary modeling code upstream.
1313

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ This command does the following:
3232

3333
- Runs the activation script, which modifies your shell environment
3434
- Updates your shell prompt to show `env-llama-cpp`, indicating the environment is active
35-
- Updates `PATH` to use so the environment’s Python interpreter
35+
- Updates `PATH` to use the environment’s Python interpreter
3636
- Ensures all `pip` commands install packages into the isolated environment
3737

3838
## Upgrade pip to the latest version
@@ -72,7 +72,8 @@ After the installation completes, your virtual environment includes:
7272
- **NumPy**: for numerical computations and array operations
7373
- **Requests**: for HTTP operations and API calls
7474
- **Other dependencies**: additional packages required by llama.cpp's Python bindings and utilities
75-
Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries
75+
76+
Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries.
7677

7778
{{< notice Tip >}}
7879
Before running any Python commands, make sure your virtual environment is activated. {{< /notice >}}

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ layout: learningpathall
88

99
In this step, you’ll download the [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) model from Hugging Face, convert it to the GGUF format for compatibility with `llama.cpp`, and generate quantized versions to optimize memory usage and improve inference speed.
1010

11-
**Note: if you want to skip the model optimization process, [GGUF](https://huggingface.co/arcee-ai/AFM-4.5B-GGUF) versions are available.**
11+
{{% notice Note %}}
12+
If you want to skip the model optimization process, [GGUF](https://huggingface.co/arcee-ai/AFM-4.5B-GGUF) versions are available. {{% /notice %}}
1213

1314
Make sure to activate your virtual environment before running any commands. The instructions below walk you through downloading and preparing the model for efficient use on AWS Graviton4.
1415

@@ -28,11 +29,11 @@ pip install huggingface_hub hf_xet
2829
This command installs:
2930

3031
- `huggingface_hub`: Python client for downloading models and datasets
31-
- `hf_xet`: Git extension for fetching large model files stored on Hugging Face
32+
- `hf_xet`: Git extension for fetching large model files hosted on Hugging Face
3233

3334
These tools include the `hf` command-line interface you'll use next.
3435

35-
## Login to the Hugging Face Hub
36+
## Log in to the Hugging Face Hub
3637

3738
```bash
3839
hf auth login
@@ -86,7 +87,7 @@ This command creates a 4-bit quantized version of the model:
8687
- `llama-quantize` is the quantization tool from Llama.cpp.
8788
- `afm-4-5B-F16.gguf` is the input GGUF model file in 16-bit precision.
8889
- `Q4_0` applies zero-point 4-bit quantization.
89-
- This reduces the model size by approximately 45% (from ~15GB to ~8GB).
90+
- This reduces the model size by approximately ~70% (from ~15GB to ~4.4GB).
9091
- The quantized model will use less memory and run faster, though with a small reduction in accuracy.
9192
- The output file will be `afm-4-5B-Q4_0.gguf`.
9293

@@ -104,7 +105,7 @@ bin/llama-quantize models/afm-4-5b/afm-4-5B-F16.gguf models/afm-4-5b/afm-4-5B-Q8
104105

105106
This command creates an 8-bit quantized version of the model:
106107
- `Q8_0` specifies 8-bit quantization with zero-point compression.
107-
- This reduces the model size by approximately 70% (from ~15GB to ~4.4GB).
108+
- This reduces the model size by approximately ~45% (from ~15GB to ~8GB).
108109
- The 8-bit version provides a better balance between memory usage and accuracy than 4-bit quantization.
109110
- The output file is named `afm-4-5B-Q8_0.gguf`.
110111
- Commonly used in production scenarios where memory resources are available.

content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/how-to-1.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ If everything was built correctly, you should see a list of all the available fl
4646

4747
Communication between the master node and the worker nodes occurs through a socket created on each worker. This socket listens for incoming data from the master—such as model parameters, tokens, hidden states, and other inference-related information.
4848
{{% notice Note %}}The RPC feature in llama.cpp is not secure by default, so you should never expose it to the open internet. To mitigate this risk, ensure that the security groups for all your EC2 instances are properly configured—restricting access to only trusted IPs or internal VPC traffic. This helps prevent unauthorized access to the RPC endpoints.{{% /notice %}}
49-
Use the following command to start the listeneing on the worker nodes:
49+
Use the following command to start the listening on the worker nodes:
5050
```bash
5151
bin/rpc-server -p 50052 -H 0.0.0.0 -t 64
5252
```

content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/how-to-2.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ llama_perf_context_print: eval time = 77429.95 ms / 127 runs ( 609
190190
llama_perf_context_print: total time = 79394.06 ms / 132 tokens
191191
llama_perf_context_print: graphs reused = 0
192192
```
193-
That's it! You have sucessfully run the llama-3.1-8B model on CPUs with the power of llama.cpp RPC functionality. The following table provides brief description of the metrics from `llama_perf`: <br><br>
193+
That's it! You have successfully run the llama-3.1-8B model on CPUs with the power of llama.cpp RPC functionality. The following table provides brief description of the metrics from `llama_perf`: <br><br>
194194

195195
| Log Line | Description |
196196
|-------------------|-----------------------------------------------------------------------------|
@@ -200,11 +200,11 @@ That's it! You have sucessfully run the llama-3.1-8B model on CPUs with the powe
200200
| eval time | Time to generate output tokens by forward-passing through the model. |
201201
| total time | Total time for both prompt processing and token generation (excludes model load). |
202202

203-
Lastly to set up OpenAI compatible API, you can use the `llama-server` functionality. The process of implementing this is described [here](/learning-paths/servers-and-cloud-computing/llama-cpu) under the "Access the chatbot using the OpenAI-compatible API" section. Here is a snippet, for how to set up llama-server for disributed inference:
203+
Lastly to set up OpenAI compatible API, you can use the `llama-server` functionality. The process of implementing this is described [here](/learning-paths/servers-and-cloud-computing/llama-cpu) under the "Access the chatbot using the OpenAI-compatible API" section. Here is a snippet, for how to set up llama-server for distributed inference:
204204
```bash
205205
bin/llama-server -m /home/ubuntu/model.gguf --port 8080 --rpc "$worker_ips" -ngl 99
206206
```
207-
At the very end of the output to the above command, you will see somethin like the following:
207+
At the very end of the output to the above command, you will see something like the following:
208208
```output
209209
main: server is listening on http://127.0.0.1:8080 - starting the main loop
210210
srv update_slots: all slots are idle

content/learning-paths/servers-and-cloud-computing/java-perf-flamegraph/1_setup.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,17 @@ layout: learningpathall
77
---
88

99

10-
## Before You Begin
11-
- There are numerous performance analysis methods and tools for Java applications, among which the call stack flame graph method is regarded as a conventional entry-level approach. Therefore, generating flame graphs is considered a basic operation.
12-
- Various methods and tools are available for generating Java flame graphs, including `async-profiler`, `Java Agent`, `jstack`, `JFR` (Java Flight Recorder), etc.
13-
- This Learning Path focuses on introducing two simple and easy-to-use methods: `async-profiler` and `Java Agent`.
10+
## Overview
11+
There are numerous performance analysis methods and tools for Java applications, among which the call stack flame graph method is regarded as a conventional entry-level approach. Therefore, generating flame graphs is considered a basic operation.
12+
Various methods and tools are available for generating Java flame graphs, including `async-profiler`, `Java Agent`, `jstack`, `JFR` (Java Flight Recorder), etc.
13+
This Learning Path focuses on introducing two simple and easy-to-use methods: `async-profiler` and `Java Agent`.
1414

1515

1616
## Setup Benchmark Server - Tomcat
1717
- [Apache Tomcat](https://tomcat.apache.org/) is an open-source Java Servlet container that enables running Java web applications, handling HTTP requests and serving dynamic content.
1818
- As a core component in Java web development, Apache Tomcat supports Servlet, JSP, and WebSocket technologies, providing a lightweight runtime environment for web apps.
1919

20-
1. Start by installing Java Development Kit (JDK) on your Arm-based server:
20+
1. Start by installing Java Development Kit (JDK) on your Arm-based server running Ubuntu:
2121
```bash
2222
sudo apt update
2323
sudo apt install -y openjdk-21-jdk
@@ -31,13 +31,13 @@ tar xzf apache-tomcat-11.0.9.tar.gz
3131

3232
3. If you intend to access the built-in examples of Tomcat via an intranet IP or even an external IP, you need to modify a configuration file as shown:
3333
```bash
34-
vim apache-tomcat-11.0.9/webapps/examples/META-INF/context.xml
34+
vi apache-tomcat-11.0.9/webapps/examples/META-INF/context.xml
3535
```
36-
Then change the values:
37-
```console
36+
Then change the allow value as shown and save the changes:
37+
```output
3838
# change <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127\.\d+\.\d+\.\d+|::1|0:0:0:0:0:0:0:1" />
3939
# to
40-
# <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow=".*" />
40+
<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow=".*" />
4141
```
4242
Now you can start Tomcat Server:
4343
```bash
@@ -62,9 +62,14 @@ Tomcat started.
6262

6363
![example image alt-text#center](./_images/lp-tomcat-examples.png "Tomcat-Examples")
6464

65+
Make sure port 8080 is open in the security group of the IP address for your Arm-based Linux machine.
66+
6567
## Setup Benchmark Client - [wrk2](https://github.com/giltene/wrk2)
6668
`wrk2` is a high-performance HTTP benchmarking tool specialized in generating constant throughput loads and measuring latency percentiles for web services. `wrk2` is an enhanced version of `wrk` that provides accurate latency statistics under controlled request rates, ideal for performance testing of HTTP servers.
6769

70+
Currently `wrk2` is only supported on x86 machines. You will run the Benchmark Client steps shown below on an x86_64 server running Ubuntu.
71+
72+
6873
1. To use `wrk2`, you will need to install some essential tools before you can build it:
6974
```bash
7075
sudo apt-get update
@@ -82,7 +87,7 @@ Move the executable to somewhere in your PATH:
8287
sudo cp wrk /usr/local/bin
8388
```
8489

85-
3. Finally, you can run the benchamrk of Tomcat through wrk2.
90+
3. Finally, you can run the benchmark of Tomcat through wrk2.
8691
```bash
8792
wrk -c32 -t16 -R50000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
8893
```

0 commit comments

Comments
 (0)