Merge branch 'ArmDeveloperEcosystem:main' into review-visualizing-ethos-u-performance

madeline-underwood · web-flow · commit 298661512578 · 2025-08-01T14:38:42.000+01:00
diff --git a/.wordlist.txt b/.wordlist.txt
@@ -4474,4 +4474,117 @@ AssetLib
 PerformanceStudio
 VkThread
 precompiled
-rollouts
+rollouts
+Bhusari
+DLLAMA
+FlameGraph
+FlameGraphs
+JSP
+KBC
+MMIO
+Paravirtualized
+PreserveFramePointer
+Servlet
+TDISP
+VirtIO
+WebSocket
+agentpath
+alarmtimer
+aoss
+apb
+ata
+bpf
+brendangregg
+chipidea
+clk
+cma
+counterintuitive
+cpuhp
+cros
+csd
+devfreq
+devlink
+dma
+dpaa
+dwc
+ecurity
+edma
+evice
+filelock
+filemap
+flamegraphs
+fsl
+glink
+gpu
+hcd
+hns
+hw
+hwmon
+icmp
+initcall
+iomap
+iommu
+ipi
+irq
+jbd
+jvmti
+kmem
+ksm
+kvm
+kyber
+libata
+libperf
+lockd
+mdio
+memcg
+mmc
+mtu
+musb
+napi
+ncryption
+netfs
+netlink
+nfs
+ntegrity
+nterface
+oom
+optee
+pagemap
+paravirtualized
+percpu
+printk
+pwm
+qcom
+qdisc
+ras
+rcu
+regmap
+rgerganov’s
+rotocol
+rpcgss
+rpmh
+rseq
+rtc
+sched
+scmi
+scsi
+skb
+smbus
+smp
+spi
+spmi
+sunrpc
+swiotlb
+tegra
+thp
+tlb
+udp
+ufs
+untrusted
+uring
+virtio
+vmalloc
+vmscan
+workqueue
+xdp
+xhci
diff --git a/content/learning-paths/servers-and-cloud-computing/_index.md b/content/learning-paths/servers-and-cloud-computing/_index.md
@@ -8,8 +8,8 @@ key_ip:
 maintopic: true
 operatingsystems_filter:
 - Android: 2
-- Linux: 154
-- macOS: 10
+- Linux: 157
+- macOS: 11
 - Windows: 14
 pinned_modules:
 - module:
@@ -22,8 +22,8 @@ subjects_filter:
 - Containers and Virtualization: 29
 - Databases: 15
 - Libraries: 9
-- ML: 28
-- Performance and Architecture: 60
+- ML: 29
+- Performance and Architecture: 62
 - Storage: 1
 - Web: 10
 subtitle: Optimize cloud native apps on Arm for performance and cost
@@ -47,6 +47,8 @@ tools_software_languages_filter:
 - ASP.NET Core: 2
 - Assembly: 4
 - assembly: 1
+- Async-profiler: 1
+- AWS: 1
 - AWS CDK: 2
 - AWS CodeBuild: 1
 - AWS EC2: 2
@@ -65,7 +67,7 @@ tools_software_languages_filter:
 - C++: 8
 - C/C++: 2
 - Capstone: 1
-- CCA: 6
+- CCA: 7
 - Clair: 1
 - Clang: 10
 - ClickBench: 1
@@ -77,18 +79,19 @@ tools_software_languages_filter:
 - Daytona: 1
 - Demo: 3
 - Django: 1
-- Docker: 17
+- Docker: 18
 - Envoy: 2
 - ExecuTorch: 1
 - FAISS: 1
+- FlameGraph: 1
 - Flink: 1
 - Fortran: 1
 - FunASR: 1
 - FVP: 4
 - GCC: 22
 - gdb: 1
 - Geekbench: 1
-- GenAI: 11
+- GenAI: 12
 - GitHub: 6
 - GitLab: 1
 - Glibc: 1
@@ -114,7 +117,7 @@ tools_software_languages_filter:
 - Linaro Forge: 1
 - Litmus7: 1
 - Llama.cpp: 1
-- LLM: 9
+- LLM: 10
 - llvm-mca: 1
 - LSE: 1
 - MariaDB: 1
@@ -132,6 +135,7 @@ tools_software_languages_filter:
 - Ollama: 1
 - ONNX Runtime: 1
 - OpenBLAS: 1
+- OpenJDK-21: 1
 - OpenShift: 1
 - OrchardCore: 1
 - PAPI: 1
@@ -144,7 +148,7 @@ tools_software_languages_filter:
 - RAG: 1
 - Redis: 3
 - Remote.It: 2
-- RME: 6
+- RME: 7
 - Runbook: 71
 - Rust: 2
 - snappy: 1
@@ -161,6 +165,7 @@ tools_software_languages_filter:
 - TensorFlow: 2
 - Terraform: 11
 - ThirdAI: 1
+- Tomcat: 1
 - Trusted Firmware: 1
 - TSan: 1
 - TypeScript: 1
@@ -173,6 +178,7 @@ tools_software_languages_filter:
 - Whisper: 1
 - WindowsPerf: 1
 - WordPress: 3
+- wrk2: 1
 - x265: 1
 - zlib: 1
 - Zookeeper: 1
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md
@@ -6,7 +6,7 @@ weight: 4
 layout: learningpathall
 ---
 
-In this step, you'll set up the Graviton4 instance with the tools and dependencies required to build and run the Arcee Foundation Model. This includes installing system packages and a Python environment.
+In this step, you'll set up the Graviton4 instance with the tools and dependencies required to build and run the AFM-4.5B model. This includes installing system packages and a Python environment.
 
 ## Update the package list
 
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md
@@ -7,7 +7,7 @@ layout: learningpathall
 ---
 ## Build the Llama.cpp inference engine
 
-In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms,including Arm-based processors like AWS Graviton4.
+In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms, including Arm-based processors like AWS Graviton4.
 
 Even though AFM-4.5B uses a custom model architecture, you can still use the standard Llama.cpp repository - Arcee AI has contributed the necessary modeling code upstream.
 
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md
@@ -32,7 +32,7 @@ This command does the following:
 
 - Runs the activation script, which modifies your shell environment
 - Updates your shell prompt to show `env-llama-cpp`, indicating the environment is active
-- Updates `PATH` to use so the environment’s Python interpreter 
+- Updates `PATH` to use the environment’s Python interpreter 
 - Ensures all `pip` commands install packages into the isolated environment
 
 ## Upgrade pip to the latest version
@@ -72,7 +72,8 @@ After the installation completes, your virtual environment includes:
 - **NumPy**: for numerical computations and array operations
 - **Requests**: for HTTP operations and API calls
 - **Other dependencies**: additional packages required by llama.cpp's Python bindings and utilities
-Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries
+  
+Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries.
 
 {{< notice Tip >}}
 Before running any Python commands, make sure your virtual environment is activated. {{< /notice >}}
diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md
@@ -8,7 +8,8 @@ layout: learningpathall
 
 In this step, you’ll download the [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) model from Hugging Face, convert it to the GGUF format for compatibility with `llama.cpp`, and generate quantized versions to optimize memory usage and improve inference speed.
 
-**Note: if you want to skip the model optimization process, [GGUF](https://huggingface.co/arcee-ai/AFM-4.5B-GGUF) versions are available.**
+{{% notice Note %}}
+If you want to skip the model optimization process, [GGUF](https://huggingface.co/arcee-ai/AFM-4.5B-GGUF) versions are available. {{% /notice %}}
 
 Make sure to activate your virtual environment before running any commands. The instructions below walk you through downloading and preparing the model for efficient use on AWS Graviton4.
 
@@ -28,11 +29,11 @@ pip install huggingface_hub hf_xet
 This command installs:
 
 - `huggingface_hub`: Python client for downloading models and datasets
-- `hf_xet`: Git extension for fetching large model files stored on Hugging Face
+- `hf_xet`: Git extension for fetching large model files hosted on Hugging Face
 
 These tools include the `hf` command-line interface you'll use next.
 
-## Login to the Hugging Face Hub
+## Log in to the Hugging Face Hub
 
 ```bash
 hf auth login
@@ -86,7 +87,7 @@ This command creates a 4-bit quantized version of the model:
 - `llama-quantize` is the quantization tool from Llama.cpp.
 - `afm-4-5B-F16.gguf` is the input GGUF model file in 16-bit precision. 
 - `Q4_0` applies zero-point 4-bit quantization.
-- This reduces the model size by approximately 45% (from ~15GB to ~8GB).
+- This reduces the model size by approximately ~70% (from ~15GB to ~4.4GB).
 - The quantized model will use less memory and run faster, though with a small reduction in accuracy.
 - The output file will be `afm-4-5B-Q4_0.gguf`.
 
@@ -104,7 +105,7 @@ bin/llama-quantize models/afm-4-5b/afm-4-5B-F16.gguf models/afm-4-5b/afm-4-5B-Q8
 
 This command creates an 8-bit quantized version of the model:
 - `Q8_0` specifies 8-bit quantization with zero-point compression.
-- This reduces the model size by approximately 70% (from ~15GB to ~4.4GB).
+- This reduces the model size by approximately ~45% (from ~15GB to ~8GB).
 - The 8-bit version provides a better balance between memory usage and accuracy than 4-bit quantization.
 - The output file is named `afm-4-5B-Q8_0.gguf`.
 - Commonly used in production scenarios where memory resources are available.
diff --git a/content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/how-to-1.md b/content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/how-to-1.md
@@ -46,7 +46,7 @@ If everything was built correctly, you should see a list of all the available fl
 
 Communication between the master node and the worker nodes occurs through a socket created on each worker. This socket listens for incoming data from the master—such as model parameters, tokens, hidden states, and other inference-related information.
 {{% notice Note %}}The RPC feature in llama.cpp is not secure by default, so you should never expose it to the open internet. To mitigate this risk, ensure that the security groups for all your EC2 instances are properly configured—restricting access to only trusted IPs or internal VPC traffic. This helps prevent unauthorized access to the RPC endpoints.{{% /notice %}}
-Use the following command to start the listeneing on the worker nodes:
+Use the following command to start the listening on the worker nodes:
 ```bash
 bin/rpc-server -p 50052 -H 0.0.0.0 -t 64
 ```
diff --git a/content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/how-to-2.md b/content/learning-paths/servers-and-cloud-computing/distributed-inference-with-llama-cpp/how-to-2.md
@@ -190,7 +190,7 @@ llama_perf_context_print:        eval time =   77429.95 ms /   127 runs   (  609
 llama_perf_context_print:       total time =   79394.06 ms /   132 tokens
 llama_perf_context_print:    graphs reused =          0
 ```
-That's it! You have sucessfully run the llama-3.1-8B model on CPUs with the power of llama.cpp RPC functionality. The following table provides brief description of the metrics from `llama_perf`: <br><br>
+That's it! You have successfully run the llama-3.1-8B model on CPUs with the power of llama.cpp RPC functionality. The following table provides brief description of the metrics from `llama_perf`: <br><br>
 
 | Log Line          | Description                                                                 |
 |-------------------|-----------------------------------------------------------------------------|
@@ -200,11 +200,11 @@ That's it! You have sucessfully run the llama-3.1-8B model on CPUs with the powe
 | eval time         | Time to generate output tokens by forward-passing through the model.        |
 | total time        | Total time for both prompt processing and token generation (excludes model load). |
 
-Lastly to set up OpenAI compatible API, you can use the `llama-server` functionality. The process of implementing this is described [here](/learning-paths/servers-and-cloud-computing/llama-cpu) under the "Access the chatbot using the OpenAI-compatible API" section. Here is a snippet, for how to set up llama-server for disributed inference:
+Lastly to set up OpenAI compatible API, you can use the `llama-server` functionality. The process of implementing this is described [here](/learning-paths/servers-and-cloud-computing/llama-cpu) under the "Access the chatbot using the OpenAI-compatible API" section. Here is a snippet, for how to set up llama-server for distributed inference:
 ```bash
 bin/llama-server -m /home/ubuntu/model.gguf --port 8080 --rpc "$worker_ips" -ngl 99
 ```
-At the very end of the output to the above command, you will see somethin like the following: 
+At the very end of the output to the above command, you will see something like the following: 
 ```output
 main: server is listening on http://127.0.0.1:8080 - starting the main loop
 srv  update_slots: all slots are idle
diff --git a/content/learning-paths/servers-and-cloud-computing/java-perf-flamegraph/1_setup.md b/content/learning-paths/servers-and-cloud-computing/java-perf-flamegraph/1_setup.md
@@ -7,17 +7,17 @@ layout: learningpathall
 ---
 
 
-## Before You Begin 
-- There are numerous performance analysis methods and tools for Java applications, among which the call stack flame graph method is regarded as a conventional entry-level approach. Therefore, generating flame graphs is considered a basic operation.
-- Various methods and tools are available for generating Java flame graphs, including `async-profiler`, `Java Agent`, `jstack`, `JFR` (Java Flight Recorder), etc.
-- This Learning Path focuses on introducing two simple and easy-to-use methods: `async-profiler` and `Java Agent`.
+## Overview 
+There are numerous performance analysis methods and tools for Java applications, among which the call stack flame graph method is regarded as a conventional entry-level approach. Therefore, generating flame graphs is considered a basic operation.
+Various methods and tools are available for generating Java flame graphs, including `async-profiler`, `Java Agent`, `jstack`, `JFR` (Java Flight Recorder), etc.
+This Learning Path focuses on introducing two simple and easy-to-use methods: `async-profiler` and `Java Agent`.
 
 
 ## Setup Benchmark Server - Tomcat
 - [Apache Tomcat](https://tomcat.apache.org/) is an open-source Java Servlet container that enables running Java web applications, handling HTTP requests and serving dynamic content.
 - As a core component in Java web development, Apache Tomcat supports Servlet, JSP, and WebSocket technologies, providing a lightweight runtime environment for web apps.
 
-1. Start by installing Java Development Kit (JDK) on your Arm-based server:
+1. Start by installing Java Development Kit (JDK) on your Arm-based server running Ubuntu:
 ```bash
 sudo apt update
 sudo apt install -y openjdk-21-jdk
@@ -31,13 +31,13 @@ tar xzf apache-tomcat-11.0.9.tar.gz
 
 3. If you intend to access the built-in examples of Tomcat via an intranet IP or even an external IP, you need to modify a configuration file as shown:
 ```bash
-vim apache-tomcat-11.0.9/webapps/examples/META-INF/context.xml
+vi apache-tomcat-11.0.9/webapps/examples/META-INF/context.xml
 ```
-Then change the values:
-```console
+Then change the allow value as shown and save the changes:
+```output
 # change <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127\.\d+\.\d+\.\d+|::1|0:0:0:0:0:0:0:1" />
 # to
-# <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow=".*" />
+<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow=".*" />
 ```
 Now you can start Tomcat Server:
 ```bash
@@ -62,9 +62,14 @@ Tomcat started.
 
 ![example image alt-text#center](./_images/lp-tomcat-examples.png "Tomcat-Examples")
 
+Make sure port 8080 is open in the security group of the IP address for your Arm-based Linux machine.
+
 ## Setup Benchmark Client - [wrk2](https://github.com/giltene/wrk2)
 `wrk2` is a high-performance HTTP benchmarking tool specialized in generating constant throughput loads and measuring latency percentiles for web services. `wrk2` is an enhanced version of `wrk` that provides accurate latency statistics under controlled request rates, ideal for performance testing of HTTP servers.
 
+Currently `wrk2` is only supported on x86 machines. You will run the Benchmark Client steps shown below on an x86_64 server running Ubuntu.
+
+
 1. To use `wrk2`, you will need to install some essential tools before you can build it:
 ```bash
 sudo apt-get update
@@ -82,7 +87,7 @@ Move the executable to somewhere in your PATH:
 sudo cp wrk /usr/local/bin
 ```
 
-3. Finally, you can run the benchamrk of Tomcat through wrk2.
+3. Finally, you can run the benchmark of Tomcat through wrk2.
 ```bash
 wrk -c32 -t16 -R50000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
 ```
diff --git a/content/learning-paths/servers-and-cloud-computing/java-perf-flamegraph/2_async-profiler.md b/content/learning-paths/servers-and-cloud-computing/java-perf-flamegraph/2_async-profiler.md
diff --git a/content/learning-paths/servers-and-cloud-computing/java-perf-flamegraph/3_agent.md b/content/learning-paths/servers-and-cloud-computing/java-perf-flamegraph/3_agent.md
diff --git a/content/learning-paths/servers-and-cloud-computing/java-perf-flamegraph/_index.md b/content/learning-paths/servers-and-cloud-computing/java-perf-flamegraph/_index.md