diff --git a/.wordlist.txt b/.wordlist.txt index 22e016cbb0..306ba0bdeb 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -4666,5 +4666,60 @@ crosh Sommelier chromeos linuxcontainers - +XPS +NIC's +offlines +passthrough +SLOs +Ker +Rui +SmartNICs +selectedalt +UIalt +lpprojectubuntuarm +RDV +chiplet +BMC +upstreams +rdv +Initrd +handoff +ACPI +PCRs +MHU +Handoff +userland +CXL +DDR +PHYs +UCIe +handoffs +CCG +CML +Codespaces +Cheng +GDM +LPI +nsec +shortcode +BSON +joedog +Seige +Antonov +jwt +kbs +Nfpl +ZjnAMjLk +hCpeYsarnnGv +kbs +rvps +xcbTMTBX +CDH +RVPS +Attester +attester +ATtestation +CoCo +procedureS +NIC’s diff --git a/content/learning-paths/cross-platform/_example-learning-path/appendix-1-formatting.md b/content/learning-paths/cross-platform/_example-learning-path/appendix-1-formatting.md index 589c59fbe6..f5a43c2979 100644 --- a/content/learning-paths/cross-platform/_example-learning-path/appendix-1-formatting.md +++ b/content/learning-paths/cross-platform/_example-learning-path/appendix-1-formatting.md @@ -83,12 +83,26 @@ Specify that line_numbers are true in the following way: \`\`\`bash { line_numbers = "true" } \ echo 'hello world' \ echo ‘I am line two’ \ -\`\`\` +\`\`\` + +```bash { line_numbers = "true" } +echo ‘hello world’ +echo ‘I am line two’ +``` + +In some cases, the line numbering should not start from one but from another +value, e.g. if the code excerpt is extracted from a larger file. Use the +`line_start` attribute to achieve this: -```bash { line_numbers = "true" } -echo ‘hello world’ -echo ‘I am line two’ -``` +\`\`\`bash { line_numbers = "true" line_start = "10" } \ +echo 'hello world' \ +echo ‘I am line two’ \ +\`\`\` + +```bash { line_numbers = "true" line_start = "10" } +echo ‘hello world’ +echo ‘I am line eleven’ +``` ### Output Lines @@ -100,7 +114,7 @@ There are three ways you can specify command outputs in code: {{% notice Note %}} In each of the three situations, code marked as 'output' will: - not be copied when clicking the 'copy' button -- not be highlightable by a cursor +- not be highlighted by a cursor - appear slightly darker {{% /notice %}} diff --git a/content/learning-paths/cross-platform/floating-point-rounding-errors/_index.md b/content/learning-paths/cross-platform/floating-point-rounding-errors/_index.md index 9c5431b21f..923b028e69 100644 --- a/content/learning-paths/cross-platform/floating-point-rounding-errors/_index.md +++ b/content/learning-paths/cross-platform/floating-point-rounding-errors/_index.md @@ -1,6 +1,10 @@ --- title: Explore floating-point differences between x86 and Arm +draft: true +cascade: + draft: true + minutes_to_complete: 30 who_is_this_for: This is an introductory topic for developers who are porting applications from x86 to Arm and want to understand how floating-point behavior differs between these architectures - particularly in the context of numerical consistency, performance, and debugging subtle bugs. diff --git a/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/_index.md b/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/_index.md index 1dea9ea1e0..967482a761 100644 --- a/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/_index.md +++ b/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/_index.md @@ -7,41 +7,41 @@ cascade: minutes_to_complete: 90 -who_is_this_for: This topic is for machine learning engineers, embedded AI developers, and researchers interested in deploying TinyML models for NLP on Arm-based edge devices using PyTorch and ExecuTorch. +who_is_this_for: This topic is for machine learning engineers, embedded AI developers, and researchers interested in deploying TinyML models for NLP on Arm-based edge devices using PyTorch and ExecuTorch. -learning_objectives: +learning_objectives: - Train a custom CNN-based sentiment classification model implemented in PyTorch. - Optimize and convert the model using ExecuTorch for Arm-based edge devices. - Deploy and run inference on the Corstone-320 FVP. prerequisites: - - Basic knowledge of machine learning concepts. - - It is advised to complete The Learning Path, [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) before starting this learning path. + - Basic knowledge of machine learning concepts. + - It is advised to complete The Learning Path, [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) before starting this learning path. - Familiarity with Python and PyTorch. - A Linux host machine or VM running Ubuntu 22.04 or higher. - - An Arm license to run the examples on the Corstone-320 Fixed Virtual Platform (FVP), for hands-on deployment. + - An Arm license to run the examples on the Corstone-320 Fixed Virtual Platform (FVP), for hands-on deployment. author: Dominica Abena O. Amanfo ### Tags -skilllevels: Intermediate +skilllevels: Introductory subjects: ML armips: - Cortex-A tools_software_languages: - - tinyML - - CNN + - tinyML + - CNN - PyTorch - ExecuTorch - + operatingsystems: - Linux further_reading: - resource: - title: Run Llama 3 on a Raspberry Pi 5 using ExecuTorch + title: Run Llama 3 on a Raspberry Pi 5 using ExecuTorch link: /learning-paths/embedded-and-microcontrollers/rpi-llama3 type: website - resource: diff --git a/content/learning-paths/embedded-and-microcontrollers/uvprojx-conversion/_index.md b/content/learning-paths/embedded-and-microcontrollers/uvprojx-conversion/_index.md index 7ec6b4f9be..71f1f19ec4 100644 --- a/content/learning-paths/embedded-and-microcontrollers/uvprojx-conversion/_index.md +++ b/content/learning-paths/embedded-and-microcontrollers/uvprojx-conversion/_index.md @@ -5,7 +5,7 @@ minutes_to_complete: 10 who_is_this_for: This is a topic for users of µVision who want to migrate to the new project format (csolution) required by CMSIS-Toolbox. -learning_objectives: +learning_objectives: - Import, convert, and build uvprojx-based projects in Keil Studio. - Convert uvprojx-based projects in µVision. - Convert and build uvprojx-based projects on the command line. @@ -19,7 +19,7 @@ prerequisites: author: Christopher Seidl ### Tags -skilllevels: Intermediate +skilllevels: Advanced subjects: Performance and Architecture armips: - Cortex-M @@ -43,7 +43,7 @@ further_reading: link: https://community.arm.com/arm-community-blogs/b/internet-of-things-blog/posts/keil-mdk-version-6 type: blog - resource: - title: keil.arm.com + title: keil.arm.com link: https://keil.arm.com type: website diff --git a/content/learning-paths/laptops-and-desktops/self_hosted_cicd_github/_index.md b/content/learning-paths/laptops-and-desktops/self_hosted_cicd_github/_index.md index 25bf578c96..e2763e261d 100644 --- a/content/learning-paths/laptops-and-desktops/self_hosted_cicd_github/_index.md +++ b/content/learning-paths/laptops-and-desktops/self_hosted_cicd_github/_index.md @@ -18,7 +18,7 @@ prerequisites: author: Dawid Borycki ### Tags -skilllevels: Intermediate +skilllevels: Introductory subjects: Migration to Arm armips: - Cortex-A diff --git a/content/learning-paths/mobile-graphics-and-gaming/afrc/_index.md b/content/learning-paths/mobile-graphics-and-gaming/afrc/_index.md index eaa130e36a..b76fa5e727 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/afrc/_index.md +++ b/content/learning-paths/mobile-graphics-and-gaming/afrc/_index.md @@ -5,7 +5,7 @@ minutes_to_complete: 25 who_is_this_for: Software developers of Android applications and mobile games who are interested in learning how to enable Arm Fixed Rate Compression (AFRC) to improve performance. -learning_objectives: +learning_objectives: - Query for fixed-rate compression support. - Specify what compression to use. - Verify that compression is applied. @@ -18,7 +18,7 @@ prerequisites: author: Jose-Emilio Munoz-Lopez ### Tags -skilllevels: Intermediate +skilllevels: Advanced subjects: Graphics armips: - Mali diff --git a/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/5-run-benchmark-on-android.md b/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/5-run-benchmark-on-android.md index 4f07675eac..158c47cbc8 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/5-run-benchmark-on-android.md +++ b/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/5-run-benchmark-on-android.md @@ -45,6 +45,7 @@ cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \ -DEXECUTORCH_BUILD_KERNELS_LLM=ON \ -DEXECUTORCH_BUILD_EXTENSION_LLM_RUNNER=ON \ + -DEXECUTORCH_BUILD_EXTENSION_LLM=ON \ -DEXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL=ON \ -DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON \ -DXNNPACK_ENABLE_ARM_BF16=OFF \ @@ -82,6 +83,10 @@ cmake --build cmake-out-android/examples/models/llama -j16 --config Release You should now have `llama_main` available for Android. +{{% notice Note %}} +If you notice that Gradle cannot find the Android SDK, add the sdk.dir path to executorch/extension/android/local.properties. +{{% /notice %}} + ## Run on Android via adb shell You will need an Arm-powered smartphone with the i8mm feature running Android, with 16GB of RAM. The following steps were tested on a Google Pixel 8 Pro phone. @@ -103,7 +108,7 @@ You should see your device listed to confirm it is connected. ``` bash adb shell mkdir -p /data/local/tmp/llama -adb push llama3_1B_kv_sdpa_xnn_qe_4_128_1024_embedding_4bit.pte /data/local/tmp/llama/ +adb push llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte /data/local/tmp/llama/ adb push $HOME/.llama/checkpoints/Llama3.2-1B-Instruct/tokenizer.model /data/local/tmp/llama/ adb push cmake-out-android/examples/models/llama/llama_main /data/local/tmp/llama/ ``` @@ -114,49 +119,53 @@ adb push cmake-out-android/examples/models/llama/llama_main /data/local/tmp/llam Use the Llama runner to execute the model on the phone with the `adb` command: ``` bash -adb shell "cd /data/local/tmp/llama && ./llama_main --model_path llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte --tokenizer_path tokenizer.model --prompt "<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --warmup=1 --cpu_threads=5 +adb shell "cd /data/local/tmp/llama && ./llama_main --model_path llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte --tokenizer_path tokenizer.model --prompt "<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --warmup=1 --cpu_threads=5" ``` The output should look something like this. ``` -I 00:00:00.003316 executorch:main.cpp:69] Resetting threadpool with num threads = 5 -I 00:00:00.009329 executorch:runner.cpp:59] Creating LLaMa runner: model_path=llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte, tokenizer_path=tokenizer.model -I 00:00:03.569399 executorch:runner.cpp:88] Reading metadata from model -I 00:00:03.569451 executorch:runner.cpp:113] Metadata: use_sdpa_with_kv_cache = 1 -I 00:00:03.569455 executorch:runner.cpp:113] Metadata: use_kv_cache = 1 -I 00:00:03.569459 executorch:runner.cpp:113] Metadata: get_vocab_size = 128256 -I 00:00:03.569461 executorch:runner.cpp:113] Metadata: get_bos_id = 128000 -I 00:00:03.569464 executorch:runner.cpp:113] Metadata: get_max_seq_len = 1024 -I 00:00:03.569466 executorch:runner.cpp:113] Metadata: enable_dynamic_shape = 1 -I 00:00:03.569469 executorch:runner.cpp:120] eos_id = 128009 -I 00:00:03.569470 executorch:runner.cpp:120] eos_id = 128001 -I 00:00:03.569471 executorch:runner.cpp:120] eos_id = 128006 -I 00:00:03.569473 executorch:runner.cpp:120] eos_id = 128007 -I 00:00:03.569475 executorch:runner.cpp:168] Doing a warmup run... -I 00:00:03.838634 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256 - -I 00:00:03.892268 executorch:text_token_generator.h:118] +I tokenizers:regex.cpp:27] Registering override fallback regex +I 00:00:00.003288 executorch:main.cpp:87] Resetting threadpool with num threads = 5 +I 00:00:00.006393 executorch:runner.cpp:44] Creating LLaMa runner: model_path=llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte, tokenizer_path=tokenizer.model +E tokenizers:hf_tokenizer.cpp:60] Error parsing json file: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: 'I' +I 00:00:00.131486 executorch:llm_runner_helper.cpp:57] Loaded TikToken tokenizer +I 00:00:00.131525 executorch:llm_runner_helper.cpp:167] Reading metadata from model +I 00:00:00.186538 executorch:llm_runner_helper.cpp:110] Metadata: use_sdpa_with_kv_cache = 1 +I 00:00:00.186574 executorch:llm_runner_helper.cpp:110] Metadata: use_kv_cache = 1 +I 00:00:00.186578 executorch:llm_runner_helper.cpp:110] Metadata: get_max_context_len = 1024 +I 00:00:00.186584 executorch:llm_runner_helper.cpp:110] Metadata: get_max_seq_len = 1024 +I 00:00:00.186588 executorch:llm_runner_helper.cpp:110] Metadata: enable_dynamic_shape = 1 +I 00:00:00.186596 executorch:llm_runner_helper.cpp:140] eos_id = 128009 +I 00:00:00.186597 executorch:llm_runner_helper.cpp:140] eos_id = 128001 +I 00:00:00.186599 executorch:llm_runner_helper.cpp:140] eos_id = 128006 +I 00:00:00.186600 executorch:llm_runner_helper.cpp:140] eos_id = 128007 +I 00:00:01.086570 executorch:text_llm_runner.cpp:89] Doing a warmup run... +I 00:00:01.087836 executorch:text_llm_runner.cpp:152] Max new tokens resolved: 128, given start_pos 0, num_prompt_tokens 54, max_context_len 1024 +I 00:00:01.292740 executorch:text_prefiller.cpp:93] Prefill token result numel(): 128256 + +I 00:00:02.264371 executorch:text_token_generator.h:123] Reached to the end of generation -I 00:00:03.892281 executorch:runner.cpp:267] Warmup run finished! -I 00:00:03.892286 executorch:runner.cpp:174] RSS after loading model: 1269.445312 MiB (0 if unsupported) -<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>I 00:00:04.076905 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256 - - -I 00:00:04.078027 executorch:runner.cpp:243] RSS after prompt prefill: 1269.445312 MiB (0 if unsupported) -I'm doing great, thanks! I'm always happy to help, communicate, and provide helpful responses. I'm a bit of a cookie (heh) when it comes to delivering concise and precise answers. What can I help you with today?<|eot_id|> -I 00:00:05.399304 executorch:text_token_generator.h:118] +I 00:00:02.264379 executorch:text_llm_runner.cpp:209] Warmup run finished! +I 00:00:02.264384 executorch:text_llm_runner.cpp:95] RSS after loading model: 1122.187500 MiB (0 if unsupported) +I 00:00:02.264624 executorch:text_llm_runner.cpp:152] Max new tokens resolved: 74, given start_pos 0, num_prompt_tokens 54, max_context_len 1024 +<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>I 00:00:02.394162 executorch:text_prefiller.cpp:93] Prefill token result numel(): 128256 + + +I 00:00:02.394373 executorch:text_llm_runner.cpp:179] RSS after prompt prefill: 1122.187500 MiB (0 if unsupported) +I'm doing great, thanks for asking! I'm always ready to help, whether it's answering a question or providing a solution. What can I help you with today?<|eot_id|> +I 00:00:03.072966 executorch:text_token_generator.h:123] Reached to the end of generation - -I 00:00:05.399314 executorch:runner.cpp:257] RSS after finishing text generation: 1269.445312 MiB (0 if unsupported) -PyTorchObserver {"prompt_tokens":54,"generated_tokens":51,"model_load_start_ms":1710296339487,"model_load_end_ms":1710296343047,"inference_start_ms":1710296343370,"inference_end_ms":1710296344877,"prompt_eval_end_ms":1710296343556,"first_token_ms":1710296343556,"aggregate_sampling_time_ms":49,"SCALING_FACTOR_UNITS_PER_SECOND":1000} -I 00:00:04.530945 executorch:stats.h:108] Prompt Tokens: 54 Generated Tokens: 69 -I 00:00:04.530947 executorch:stats.h:114] Model Load Time: 1.196000 (seconds) -I 00:00:04.530949 executorch:stats.h:124] Total inference time: 1.934000 (seconds) Rate: 35.677353 (tokens/second) -I 00:00:04.530952 executorch:stats.h:132] Prompt evaluation: 0.176000 (seconds) Rate: 306.818182 (tokens/second) -I 00:00:04.530954 executorch:stats.h:143] Generated 69 tokens: 1.758000 (seconds) Rate: 39.249147 (tokens/second) -I 00:00:04.530956 executorch:stats.h:151] Time to first generated token: 0.176000 (seconds) -I 00:00:04.530959 executorch:stats.h:158] Sampling time over 123 tokens: 0.067000 (seconds) + +I 00:00:03.072972 executorch:text_llm_runner.cpp:199] RSS after finishing text generation: 1122.187500 MiB (0 if unsupported) +PyTorchObserver {"prompt_tokens":54,"generated_tokens":36,"model_load_start_ms":1756473387815,"model_load_end_ms":1756473388715,"inference_start_ms":1756473389893,"inference_end_ms":1756473390702,"prompt_eval_end_ms":1756473390023,"first_token_ms":1756473390023,"aggregate_sampling_time_ms":22,"SCALING_FACTOR_UNITS_PER_SECOND":1000} +I 00:00:03.072993 executorch:stats.h:108] Prompt Tokens: 54 Generated Tokens: 36 +I 00:00:03.072995 executorch:stats.h:114] Model Load Time: 0.900000 (seconds) +I 00:00:03.072996 executorch:stats.h:124] Total inference time: 0.809000 (seconds) Rate: 44.499382 (tokens/second) +I 00:00:03.072998 executorch:stats.h:132] Prompt evaluation: 0.130000 (seconds) Rate: 415.384615 (tokens/second) +I 00:00:03.073000 executorch:stats.h:143] Generated 36 tokens: 0.679000 (seconds) Rate: 53.019146 (tokens/second) +I 00:00:03.073002 executorch:stats.h:151] Time to first generated token: 0.130000 (seconds) +I 00:00:03.073004 executorch:stats.h:158] Sampling time over 90 tokens: 0.022000 (seconds) ``` You have successfully run the Llama 3.1 1B Instruct model on your Android smartphone with ExecuTorch using KleidiAI kernels. diff --git a/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/example-prompt-1.png b/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/example-prompt-1.png index f12f3c8a8d..46e00e0606 100644 Binary files a/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/example-prompt-1.png and b/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/example-prompt-1.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/example-prompt-2.png b/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/example-prompt-2.png index 922e925c76..638a0ab3fa 100644 Binary files a/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/example-prompt-2.png and b/content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/example-prompt-2.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/profiling-unity-apps-on-android/_index.md b/content/learning-paths/mobile-graphics-and-gaming/profiling-unity-apps-on-android/_index.md index 3e9882cabd..03c7d7b29f 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/profiling-unity-apps-on-android/_index.md +++ b/content/learning-paths/mobile-graphics-and-gaming/profiling-unity-apps-on-android/_index.md @@ -5,7 +5,7 @@ minutes_to_complete: 40 who_is_this_for: Unity developers wanting to analyze the performance of their apps on Android devices -learning_objectives: +learning_objectives: - Deploy to Android - Profile code running on an Android device - Analyze performance data @@ -19,7 +19,7 @@ prerequisites: author: Arm ### Tags -skilllevels: Intermediate +skilllevels: Introductory subjects: Performance and Architecture armips: - armv8 diff --git a/content/learning-paths/mobile-graphics-and-gaming/ray_tracing/_index.md b/content/learning-paths/mobile-graphics-and-gaming/ray_tracing/_index.md index a7266875c7..d6ccb5c6ef 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/ray_tracing/_index.md +++ b/content/learning-paths/mobile-graphics-and-gaming/ray_tracing/_index.md @@ -5,7 +5,7 @@ minutes_to_complete: 120 who_is_this_for: This Learning Path is for Vulkan developers who are familiar with rendering and are interested in deploying ray tracing in their applications. -learning_objectives: +learning_objectives: - Describe how the Vulkan ray tracing API works. - Describe how to use ray tracing to implement realistic shadows, reflections, and refractions. - Implement basic ray tracing effects in a Vulkan renderer. @@ -18,7 +18,7 @@ prerequisites: author: Iago Calvo Lista ### Tags -skilllevels: Intermediate +skilllevels: Advanced subjects: Graphics armips: - Mali diff --git a/content/learning-paths/mobile-graphics-and-gaming/using-neon-intrinsics-to-optimize-unity-on-android/_index.md b/content/learning-paths/mobile-graphics-and-gaming/using-neon-intrinsics-to-optimize-unity-on-android/_index.md index 684bcefadc..c445131929 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/using-neon-intrinsics-to-optimize-unity-on-android/_index.md +++ b/content/learning-paths/mobile-graphics-and-gaming/using-neon-intrinsics-to-optimize-unity-on-android/_index.md @@ -5,7 +5,7 @@ minutes_to_complete: 90 who_is_this_for: Developers who want to optimize their Unity apps on Android -learning_objectives: +learning_objectives: - Use Arm Neon intrinsics in your Unity C# scripts - Optimize your code - Collect and compare performance data using the Unity Profiler and Analyzer tools @@ -19,7 +19,7 @@ prerequisites: author: Arm ### Tags -skilllevels: Intermediate +skilllevels: Advanced subjects: Gaming armips: - armv8 diff --git a/content/learning-paths/servers-and-cloud-computing/cca-trustee/cca-trustee.md b/content/learning-paths/servers-and-cloud-computing/cca-trustee/cca-trustee.md index 428f0677d0..0eded26191 100644 --- a/content/learning-paths/servers-and-cloud-computing/cca-trustee/cca-trustee.md +++ b/content/learning-paths/servers-and-cloud-computing/cca-trustee/cca-trustee.md @@ -86,7 +86,7 @@ A verifier driver parses the attestation evidence provided by the hardware TEE. 1. Verifies the hardware TEE signature of the TEE quote and report provided in the evidence 2. Receives the evidence and organizes the status into a JSON format to be returned -In this Learning Path, the AS is configured to use an external CCA verifer. +In this Learning Path, the AS is configured to use an external CCA verifier. [Linaro](https://www.linaro.org) provides such an attestation verifier for use with pre-silicon Arm CCA platforms. This verifier is built from the Open-Source [Veraison project](https://github.com/veraison). @@ -138,8 +138,8 @@ FVP and the reference software stack, see the Learning Path. When the AS receives an attestation token from the realm via KBS: -- it calls an external CCA verifer (the Linaro attestation verifier service) to obtain an attestation result. -- the external CCA verifer checks the token's cryptographic signature, +- it calls an external CCA verifier (the Linaro attestation verifier service) to obtain an attestation result. +- the external CCA verifier checks the token's cryptographic signature, verifies that it denotes a confidential computing platform and provides an attestation result. - it also checks the token evidences against its own attestation policies and updates attestation result status and trustworthiness vectors. diff --git a/content/learning-paths/servers-and-cloud-computing/cca-trustee/flow.md b/content/learning-paths/servers-and-cloud-computing/cca-trustee/flow.md index 49f06e54db..63c5b033a7 100644 --- a/content/learning-paths/servers-and-cloud-computing/cca-trustee/flow.md +++ b/content/learning-paths/servers-and-cloud-computing/cca-trustee/flow.md @@ -199,7 +199,7 @@ Using JWK key from JWT header Error: verifying signed EAR from "ear.jwt" using "JWK header" key: failed verifying JWT message: jwt.Parse: failed to parse token: jwt.Validate: validation failed: "exp" not satisfied: token is expired ``` -Please obtain a new EAR message by re-runing the attestation command. +Please obtain a new EAR message by re-running the attestation command. {{% /notice %}} diff --git a/content/learning-paths/servers-and-cloud-computing/envoy-gcp/_index.md b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/_index.md new file mode 100644 index 0000000000..75351eaac2 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/_index.md @@ -0,0 +1,61 @@ +--- +title: Deploy Envoy on Google Axion processors + +draft: true +cascade: + draft: true + +minutes_to_complete: 30 + +who_is_this_for: This is an introductory topic is for software developers interested in migrating their Envoy workloads from x86_64 servers to Arm-based servers, specifically on Google Axion–based C4A virtual machines. + +learning_objectives: + - Start an Arm virtual machine on Google Cloud Platform (GCP) using the C4A Google Axion instance + - Install and configure Envoy on Arm-based GCP C4A instances + - Validate Envoy functionality through baseline testing + - Benchmark Envoy performance on Arm + +prerequisites: + - A [Google Cloud Platform (GCP)](https://cloud.google.com/free?utm_source=google&hl=en) account with billing enabled + - Familiarity with networking concepts and the [Envoy architecture](https://www.envoyproxy.io/docs/envoy/latest/). + +author: Pareena Verma + +##### Tags +skilllevels: Advanced +subjects: Web +cloud_service_providers: Google Cloud + +armips: + - Neoverse + +tools_software_languages: + - Envoy + - Siege + +operatingsystems: + - Linux + +# ================================================================================ +# FIXED, DO NOT MODIFY +# ================================================================================ +further_reading: + - resource: + title: Google Cloud official documentation + link: https://cloud.google.com/docs + type: documentation + + - resource: + title: Envoy documentation + link: https://www.envoyproxy.io/docs/envoy/latest/about_docs + type: documentation + + - resource: + title: The official documentation for Siege + link: https://www.joedog.org/siege-manual/ + type: documentation + +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # Indicates this should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/envoy-gcp/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/envoy-gcp/background.md b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/background.md new file mode 100644 index 0000000000..633ea96aa0 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/background.md @@ -0,0 +1,23 @@ +--- +title: Getting started with Envoy on Google Axion C4A (Arm Neoverse-V2) + +weight: 2 + +layout: "learningpathall" +--- + +## Google Axion C4A Arm instances in Google Cloud + +Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications. + +The C4A series provides a cost-effective alternative to x86 virtual machines while leveraging the scalability and performance benefits of the Arm architecture in Google Cloud. + +To learn more about Google Axion, refer to the [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu) blog. + +## Envoy for service proxy and traffic management on Arm + +Envoy is an open-source, high-performance edge and service proxy designed for cloud-native applications. + +It handles service-to-service communication, traffic routing, load balancing, and observability, making microservices more reliable and secure. + +Envoy is widely used in service meshes, API gateways, and modern cloud environments. Learn more from the [Envoy official website](https://www.envoyproxy.io/) and its [official documentation](https://www.envoyproxy.io/docs/envoy/latest/). diff --git a/content/learning-paths/servers-and-cloud-computing/envoy-gcp/baseline-testing.md b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/baseline-testing.md new file mode 100644 index 0000000000..ec16a99ec4 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/baseline-testing.md @@ -0,0 +1,141 @@ +--- +title: Envoy baseline testing on Google Axion C4A Arm Virtual machine +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + + +With Envoy installed successfully on your GCP C4A Arm virtual machine, you will proceed to validate that the Envoy is running as expected. + +## Validate Envoy installation with a baseline test + +In this section, you will learn how to create a minimal Envoy config, start Envoy with it, and verify functionality using `curl`. +The test will confirm that Envoy listens on port **10000**, forwards requests to `httpbin.org`, and returns a successful **200 OK** response. + +### Create a Minimal Configuration File + +Using a file editor of your choice, create a file named `envoy_config.yaml`, and add the below content to it. This file configures Envoy to listen on port **10000** and forward all traffic to `http://httpbin.org`. The `host_rewrite_literal` is essential to prevent 404 Not Found errors from the upstream server. + +```YAML +static_resources: + listeners: + - name: listener_0 + address: + socket_address: + protocol: TCP + address: 0.0.0.0 + port_value: 10000 + filter_chains: + - filters: + - name: envoy.filters.network.http_connection_manager + typed_config: + "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager + stat_prefix: ingress_http + route_config: + name: local_route + virtual_hosts: + - name: backend + domains: ["*"] + routes: + - match: + prefix: "/" + route: + cluster: service_httpbin + host_rewrite_literal: httpbin.org + http_filters: + - name: envoy.filters.http.router + typed_config: + "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router + clusters: + - name: service_httpbin + connect_timeout: 0.5s + type: LOGICAL_DNS + dns_lookup_family: V4_ONLY + lb_policy: ROUND_ROBIN + load_assignment: + cluster_name: service_httpbin + endpoints: + - lb_endpoints: + - endpoint: + address: + socket_address: + address: httpbin.org + port_value: 80 +``` +- **Listeners:** Envoy is configured to accept incoming HTTP requests on port **10000** of your VM. +- **HTTP Connection Manager:** A filter processes the incoming requests, directing them to the appropriate backend. +- **Routing:** All traffic is routed to the `service_httpbin` cluster, with the `Host` header rewritten to `httpbin.org`. +- **Clusters:** The `service_httpbin` cluster defines the upstream service as `httpbin.org` on port **80**, which is where requests are ultimately forwarded. + +### Run and Test Envoy + +This is the final phase of functional validation, confirming that the proxy is operational. +Start the Envoy proxy using your configuration file as shown on your current terminal: + +```console + envoy -c envoy_config.yaml --base-id 1 +``` +The output should look similar to: + +```output +[2025-08-21 11:53:51.597][67137][info][config] [source/server/configuration_impl.cc:138] loading 1 listener(s) +[2025-08-21 11:53:51.597][67137][info][config] [source/server/configuration_impl.cc:154] loading stats configuration +[2025-08-21 11:53:51.598][67137][warning][main] [source/server/server.cc:928] There is no configured limit to the number of allowed active downstream connections. Configure a limit in `envoy.resource_monitors.downstream_connections` resource monitor. +[2025-08-21 11:53:51.598][67137][info][main] [source/server/server.cc:969] starting main dispatch loop +[2025-08-21 11:53:51.599][67137][info][runtime] [source/common/runtime/runtime_impl.cc:614] RTDS has finished initialization +[2025-08-21 11:53:51.599][67137][info][upstream] [source/common/upstream/cluster_manager_impl.cc:240] cm init: all clusters initialized +[2025-08-21 11:53:51.599][67137][info][main] [source/server/server.cc:950] all clusters initialized. initializing init manager +[2025-08-21 11:53:51.599][67137][info][config] [source/common/listener_manager/listener_manager_impl.cc:930] all dependencies initialized. starting workers +``` + +Now, open a new terminal and send a test request to the Envoy listener using `curl`. + +```console +curl -v http://localhost:10000/get +``` +The `-v` flag provides verbose output, showing the full request and response headers. A successful test will show a **HTTP/1.1 200 OK** response with a JSON body from `httpbin.org`. + +The output should look similar to: + +```output +* Trying 127.0.0.1:10000... +* Connected to 127.0.0.1 (127.0.0.1) port 10000 (#0) +> GET /get HTTP/1.1 +> Host: 127.0.0.1:10000 +> User-Agent: curl/7.76.1 +> Accept: */* +> +* Mark bundle as not supporting multiuse +< HTTP/1.1 200 OK +< date: Fri, 22 Aug 2025 11:20:35 GMT +< content-type: application/json +< content-length: 301 +< server: envoy +< access-control-allow-origin: * +< access-control-allow-credentials: true +< x-envoy-upstream-service-time: 1042 +< +{ + "args": {}, + "headers": { + "Accept": "*/*", + "Host": "httpbin.org", + "User-Agent": "curl/7.76.1", + "X-Amzn-Trace-Id": "Root=1-68a85282-10af9cfe0385774600509ddd", + "X-Envoy-Expected-Rq-Timeout-Ms": "15000" + }, + "origin": "34.63.220.63", + "url": "http://httpbin.org/get" +} +* Connection #0 to host 127.0.0.1 left intact +``` +#### Summary of the curl Output + +- **Successful Connection:** The `curl` command successfully connected to the Envoy proxy on `localhost:10000`. +- **Correct Status Code:** Envoy successfully forwarded the request and received a successful `200 OK` response from the upstream server. +- **Host Header Rewrite:** The Host header was correctly modified from `localhost:10000` to `httpbin.org` as defined in the configuration. +- **End-to-End Success:** The proxy is fully operational, proving that requests are correctly received, processed, and forwarded to the intended backend. + +This confirms the end-to-end flow with Envoy server is working correctly. diff --git a/content/learning-paths/servers-and-cloud-computing/envoy-gcp/benchmarking.md b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/benchmarking.md new file mode 100644 index 0000000000..bb9675b17d --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/benchmarking.md @@ -0,0 +1,149 @@ +--- +title: Envoy performance benchmarks on Arm64 and x86_64 in Google Cloud +weight: 6 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## How to run Envoy benchmarks with Siege on Arm64 in GCP + +**Siege** is a lightweight HTTP load testing and benchmarking tool that simulates concurrent users making requests to a target service. It is useful for Envoy benchmarking because it measures availability, throughput, response time, and failure rates under load, thus helping evaluate Envoy’s performance as a proxy under real-world traffic conditions. + +Follow the steps outlined to run Envoy benchmarks using Siege. + +### Install Siege(Build from Source) + +1. Install required build tools + +```console +sudo dnf groupinstall -y "Development Tools" +sudo dnf install -y wget make gcc +``` +2. Download, extract and build Siege source + +```console +wget http://download.joedog.org/siege/siege-4.1.6.tar.gz +tar -xvzf siege-4.1.6.tar.gz +cd siege-4.1.6 +./configure +make +sudo make install +``` +You have now successfully built and installed Seige on your Arm-based machine. + +3. Verify installation + +```console +siege --version +``` +This checks if Siege is installed properly and shows the version number. +```output +SIEGE 4.1.6 + +Copyright (C) 2023 by Jeffrey Fulmer, et al. +This is free software; see the source for copying conditions. +There is NO warranty; not even for MERCHANTABILITY or FITNESS +FOR A PARTICULAR PURPOSE. +``` +### Envoy Benchmarking + +1. To start, make sure Envoy is up and running with your config file (listening on port 10000): + + +```console +envoy -c envoy_config.yaml --base-id 1 +``` +This runs the Envoy proxy with your configuration file (envoy_config.yaml) so it can start listening for requests. + +2. On another terminal, verify that envoy is running as expected with curl: + +``` +curl -v http://127.0.0.1:10000/get +``` +Running from another terminal returns a **200 OK** status, confirming that Envoy is running and successfully processing requests. + +3. Run a Time-based Load Test + +There are different ways you can setup your benchmark tests. Here you will run a Benchmark for a fixed time instead of using request count: + +```console +siege -c30 -t10S http://127.0.0.1:10000/get +``` +This runs a load test where 30 users hit Envoy continuously for 10 seconds. After this, Siege will show performance results. + +The output should show a list of HTTP requests and responses followed by a summary as shown: + +```output +HTTP/1.1 200 0.03 secs: 383 bytes ==> GET /get +HTTP/1.1 200 0.03 secs: 383 bytes ==> GET /get +HTTP/1.1 200 0.03 secs: 383 bytes ==> GET /get +HTTP/1.1 200 0.02 secs: 383 bytes ==> GET /get +HTTP/1.1 200 0.42 secs: 383 bytes ==> GET /get +HTTP/1.1 200 0.03 secs: 383 bytes ==> GET /get +HTTP/1.1 200 0.03 secs: 383 bytes ==> GET /get +HTTP/1.1 200 1.17 secs: 383 bytes ==> GET /get +HTTP/1.1 200 0.81 secs: 383 bytes ==> GET /get +HTTP/1.1 200 0.14 secs: 383 bytes ==> GET /get + +Lifting the server siege... +Transactions: 1019 hits +Availability: 99.80 % +Elapsed time: 10.38 secs +Data transferred: 0.37 MB +Response time: 0.29 secs +Transaction rate: 98.17 trans/sec +Throughput: 0.04 MB/sec +Concurrency: 28.07 +Successful transactions: 1019 +Failed transactions: 2 +Longest transaction: 2.89 +Shortest transaction: 0.02 +``` + +### Understanding Envoy benchmark metrics and results with Siege + +- **Transactions**: Total number of completed requests during the benchmark. +- **Availability**: Percentage of requests that returned a successful response. +- **Elapsed Time**: Total time taken to run the benchmark test. +- **Data Transferred**: Total amount of data exchanged during the test. +- **Response Time**: Average time taken for the server to respond to each request. +- **Transaction Rate**: Number of requests processed per second. +- **Throughput**: Volume of data transferred per second. +- **Concurrency**: Average number of simultaneous connections maintained. +- **Successful Transactions**: Total number of requests completed successfully. +- **Failed Transactions**: Total number of requests that failed. +- **Longest Transaction**: Maximum response time observed for a single request. +- **Shortest Transaction**: Minimum response time observed for a single request. + +### Benchmark summary on x86_64: +To compare the benchmark results, the following results were collected by running the same benchmark on a `c3-standard-4` (4 vCPU, 2 core, 16 GB Memory) x86_64 virtual machine in GCP, running RHEL 9. + +| Metric | Value | Metric | Value | +|-------------------------|--------------|---------------------------|-----------------| +| Transactions | 720 hits | Availability | 98.90 % | +| Elapsed time | 10.98 secs | Data transferred | 0.26 MB | +| Response time | 0.44 secs | Transaction rate | 65.57 trans/sec | +| Throughput | 0.02 MB/sec | Concurrency | 28.66 | +| Successful transactions | 720 | Failed transactions | 8 | +| Longest transaction | 4.63 secs | Shortest transaction | 0.02 secs | + +### Benchmark summary on Arm64: +Results from the earlier run on the c4a-standard-4 (4 vCPU, 16 GB memory) Arm64 VM in GCP (RHEL 9): + +| Metric | Value | Metric | Value | +|-------------------------|---------------|---------------------------|-----------------| +| Transactions | 1019 hits | Availability | 99.80 % | +| Elapsed time | 10.38 secs | Data transferred | 0.37 MB | +| Response time | 0.29 secs | Transaction rate | 98.17 trans/sec | +| Throughput | 0.04 MB/sec | Concurrency | 28.07 | +| Successful transactions | 1019 | Failed transactions | 2 | +| Longest transaction | 2.89 secs | Shortest transaction | 0.02 secs | + +### Envoy performance benchmarking comparison on Arm64 and x86_64 +When you compare the benchmarking performance results between the two instance types with the same vCPUs, you will notice that on the Google Axion C4A Arm-based instances: + +- You have more successful transactions, fewer failures. +- Lower response times, higher transaction rate, better throughput. + +You have successfully learned how to use Siege to benchmark Envoy on your Arm-based Axion Google cloud instance, validating both performance and reliability against similar x86 instances. diff --git a/content/learning-paths/servers-and-cloud-computing/envoy-gcp/deploy.md b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/deploy.md new file mode 100644 index 0000000000..b932b53b36 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/deploy.md @@ -0,0 +1,56 @@ +--- +title: How to deploy Envoy on Google Axion C4A Arm virtual machines +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + + +## How to deploy Envoy on a Google Axion C4A Arm virtual machine +In this section you will learn how to install Envoy Proxy v1.30.0 on a Google Cloud Axion C4A virtual machine running RHEL 9. You will install the dependencies, download the official static Arm64 Envoy binary and check the installed version. + +1. Install Dependencies + +```console +sudo dnf install -y \ + autoconf \ + curl \ + libtool \ + patch \ + python3 \ + python3-pip \ + unzip \ + git +pip3 install virtualenv +``` + +2. Install Envoy (Static Arm64 Binary) + +You will now download and install the Envoy binary on your Arm-based instance. +Download the binary directly to **/usr/local/bin/envoy**. The `-L` flag is crucial as it follows any redirects from the download URL. + +```console +sudo curl -L \ + -o /usr/local/bin/envoy \ + https://github.com/envoyproxy/envoy/releases/download/v1.30.0/envoy-1.30.0-linux-aarch_64 +``` +Change the permissions on the downloaded binary to make it an executable: + +```console +sudo chmod +x /usr/local/bin/envoy +``` +Verify the installation by checking its version. + +```console +envoy --version +``` +This confirms the binary is correctly placed and executable. + +The output should look like: + +```output +envoy version: 50ea83e602d5da162df89fd5798301e22f5540cf/1.30.0/Clean/RELEASE/BoringSSL +``` +This confirms the installation of Envoy. +You can now proceed with the baseline testing in the next section. diff --git a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/image1.png b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/image1.png similarity index 100% rename from content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/image1.png rename to content/learning-paths/servers-and-cloud-computing/envoy-gcp/image1.png diff --git a/content/learning-paths/servers-and-cloud-computing/envoy-gcp/instance.md b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/instance.md new file mode 100644 index 0000000000..87db47d57e --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/envoy-gcp/instance.md @@ -0,0 +1,30 @@ +--- +title: How to create a Google Axion C4A Arm virtual machine on GCP +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## How to create a Google Axion C4A Arm VM on Google Cloud + +In this section, you will learn how to provision a Google Axion C4A Arm virtual machine on Google Cloud Platform (GCP) using the **c4a-standard-4 (4 vCPUs, 16 GB memory)** machine type in the Google Cloud Console. + +For details on GCP setup, refer to the [Getting started with Google Cloud Platform](https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/google/) Learning Path. + +### Create a Google Axion C4A Arm VM in Google Cloud Console + +To create a virtual machine based on the C4A instance type: +1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/). +2. Go to **Compute Engine > VM Instances** and select **Create Instance**. +3. Under **Machine configuration**: + - Enter details such as **Instance name**, **Region**, and **Zone**. + - Set **Series** to `C4A`. + - Select a machine type such as `c4a-standard-4`. + + ![Create a Google Axion C4A Arm virtual machine in the Google Cloud Console with c4a-standard-4 selected alt-text#center](./image1.png "Google Cloud Console – creating a Google Axion C4A Arm virtual machine") + +4. Under **OS and Storage**, select **Change**, then choose an Arm64-based OS image. + For this Learning Path, use **Red Hat Enterprise Linux 9**. Ensure you select the **Arm image** variant. Click **Select**. +5. Under **Networking**, enable **Allow HTTP traffic**. +6. Click **Create** to launch the instance. diff --git a/content/learning-paths/servers-and-cloud-computing/github-on-arm/_index.md b/content/learning-paths/servers-and-cloud-computing/github-on-arm/_index.md index 3c68beb5c2..5a5c0314a7 100644 --- a/content/learning-paths/servers-and-cloud-computing/github-on-arm/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/github-on-arm/_index.md @@ -1,22 +1,18 @@ --- title: Deploy GitHub Actions Self-Hosted Runner on Google Axion C4A virtual machine -draft: true -cascade: - draft: true - minutes_to_complete: 15 -who_is_this_for: This is an introductory topic for developers who want to deploy GitHub Actions Self-Hosted Runner on an Arm-based Google Axion C4A instance. +who_is_this_for: This is an introductory topic for developers who want to deploy a GitHub Actions self-hosted runner on an Arm-based Google Axion C4A instance. learning_objectives: - - Provision an Arm virtual machine on the Google Cloud Platform using the C4A Google Axion instance family. - - Set up and validate a GitHub Actions self-hosted runner on the Arm virtual machine. - - Deploy a basic CI workflow with NGINX and verify execution on Arm infrastructure. + - Provision an Arm virtual machine on the Google Cloud Platform using the C4A Google Axion instance family + - Set up and validate a GitHub Actions self-hosted runner on the Arm virtual machine + - Deploy a basic CI workflow with NGINX and verify execution on Arm infrastructure prerequisites: - - A [Google Cloud Platform (GCP)](https://cloud.google.com/free?utm_source=google&hl=en) account with billing enabled. - - A GitHub account. You can sign up [here](https://github.com/signup). + - A [Google Cloud Platform (GCP)](https://cloud.google.com/free?utm_source=google&hl=en) account with billing enabled + - A GitHub account; you can sign up [here](https://github.com/signup) author: Annie Tallund @@ -39,22 +35,22 @@ operatingsystems: # FIXED, DO NOT MODIFY # ================================================================================ further_reading: - - resource: - title: Google Cloud official website and documentation - link: https://cloud.google.com/docs - type: documentation - - - resource: - title: Github-action official website and documentation - link: https://docs.github.com/en/actions - type: documentation - - - resource: - title: GitHub Actions Arm runners - link: https://github.blog/news-insights/product-news/arm64-on-github-actions-powering-faster-more-efficient-build-systems/ - type: website - - - resource: + - resource: + title: Google Cloud documentation + link: https://cloud.google.com/docs + type: documentation + + - resource: + title: GitHub Actions documentation + link: https://docs.github.com/en/actions + type: documentation + + - resource: + title: GitHub Actions Arm runners (announcement) + link: https://github.blog/news-insights/product-news/arm64-on-github-actions-powering-faster-more-efficient-build-systems/ + type: website + + - resource: title: GCP Quickstart Guide to Create a virtual machine link: https://cloud.google.com/compute/docs/instances/create-start-instance type: website diff --git a/content/learning-paths/servers-and-cloud-computing/github-on-arm/background.md b/content/learning-paths/servers-and-cloud-computing/github-on-arm/background.md index e450977285..a193194968 100644 --- a/content/learning-paths/servers-and-cloud-computing/github-on-arm/background.md +++ b/content/learning-paths/servers-and-cloud-computing/github-on-arm/background.md @@ -8,20 +8,20 @@ layout: "learningpathall" ## Google Axion C4A series -The Google Axion C4A series is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance ideal for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications. +The Google Axion C4A series is a family of Arm-based virtual machines (VMs) built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance ideal for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications. -The C4A series provides offer a cost-effective virtual machine while leveraging the scalability and performance benefits of the Arm architecture in Google Cloud. +The C4A series provides cost-effective VMs while leveraging the scalability and performance benefits of the Arm architecture on Google Cloud. -To learn more about Google Axion, refer to the blog [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu). +Learn more in Google’s announcement: [Introducing Google Axion processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu). ## GitHub Actions and CI/CD -GitHub Actions is a powerful CI/CD (Continuous Integration and Continuous Delivery) platform built into GitHub. It allows developers to automate tasks such as building, testing, and deploying code in response to events like code pushes, pull requests, or scheduled jobs—directly from their GitHub repositories. This helps improve development speed, reliability, and collaboration. +GitHub Actions is a powerful CI/CD (Continuous Integration and Continuous Delivery) platform built into GitHub. It allows developers to automate tasks such as building, testing, and deploying code in response to events like code pushes, pull requests, or scheduled jobs - directly from their GitHub repositories. This helps improve development speed, reliability, and collaboration. A key feature of GitHub Actions is [self-hosted runners](https://docs.github.com/en/actions/concepts/runners/about-self-hosted-runners), which let you run workflows on your own infrastructure instead of GitHub’s hosted servers. This is especially useful for: -- Running on custom hardware, including Arm64-based systems (e.g., Google Axion virtual machine), to optimize performance and ensure architecture-specific compatibility. -- Private network access, allowing secure interaction with internal services or databases. -- Faster execution, especially for resource-intensive workflows, by using dedicated or high-performance machines. +- Running on custom hardware, including Arm64-based systems (for example, Google Axion virtual machine), to optimize performance and ensure architecture-specific compatibility +- Private network access, allowing secure interaction with internal services or databases +- Faster execution, especially for resource-intensive workflows, by using dedicated or high-performance machines -Self-hosted runners provide more control, flexibility, and cost-efficiency—making them ideal for advanced CI/CD pipelines and platform-specific testing. +Self-hosted runners give you more control, flexibility, and cost efficiency - ideal for advanced CI/CD pipelines and platform-specific testing. diff --git a/content/learning-paths/servers-and-cloud-computing/github-on-arm/deploy.md b/content/learning-paths/servers-and-cloud-computing/github-on-arm/deploy.md index 348852d98b..8814e55903 100644 --- a/content/learning-paths/servers-and-cloud-computing/github-on-arm/deploy.md +++ b/content/learning-paths/servers-and-cloud-computing/github-on-arm/deploy.md @@ -6,10 +6,11 @@ weight: 4 layout: learningpathall --- +## Overview -This section shows how to deploy a self-hosted GitHub Actions runner on your instance. It covers installing Git and GitHub CLI, authenticating with GitHub and configuring the runner on an Arm64 environment for optimized CI/CD workflows. +This section shows you how to deploy a GitHub Actions self-hosted runner on your Arm64 Google Axion C4A instance. You will install Git and GitHub CLI, authenticate with GitHub, and register the runner so CI/CD workflows run on Arm infrastructure. -### Set up development environment +## Set up your development environment Start by installing the required dependencies using the `apt` package manager: @@ -18,7 +19,7 @@ sudo apt update sudo apt install -y git gh vim ``` -Next step is to configure your git credentials. Update the command with your name and email. +Configure your Git identity: ```bash git config --global user.email "you@example.com" @@ -27,30 +28,30 @@ git config --global user.name "Your Name" Now you are ready to connect the machine to GitHub. The command below is used to authenticate the GitHub CLI with your GitHub account. It allows you to securely log in using a web browser or token, enabling the CLI to interact with repositories, actions, and other GitHub features on your behalf. +Authenticate with GitHub: ```console gh auth login ``` -The command will prompt you to make a few choices. For this use-case, you can use the default ones as shown in the image below. +Follow the prompts and accept the defaults: -![Login to GitHub](./images/gh-auth.png) +![Login to GitHub alt-text#center](./images/gh-auth.png "Screenshot of GitHub authentication prompt") -{{% notice %}} -If you get an error opening the browser on your virtual machine, you can navigate to the following URL on the host machine. +{{% notice Note %}} +If you get an error opening the browser on your virtual machine, you can navigate to the following URL on the host machine and enter the device code displayed in the CLI of the virtual machine: ``` https://github.com/login/device ``` -From there, you can enter the code displayed in the CLI of the virtual machine. {{% /notice %}} -If the log in was successful, you will see the following confirmation in your browser window. +When authentication succeeds, you will see a confirmation screen in your browser: -![GitHub UI](./images/login-page.png) +![GitHub UIalt-text#center](./images/login-page.png "Screenshot of successful GitHub login confirmation") -### Test GitHub CLI and Git +## Test GitHub CLI and Git -The command below creates a new public GitHub repository named **test-repo** using the GitHub CLI. It sets the repository visibility to public, meaning anyone can view it +The command below creates a new public GitHub repository named `test-repo` using the GitHub CLI. It sets the repository visibility to public, meaning that anyone can view it: ```console gh repo create test-repo --public @@ -59,66 +60,65 @@ You should see an output similar to: ```output ✓ Created repository /test-repo on GitHub https://github.com//test-repo -``` + ``` -### Configure the Self-Hosted Runner + ## Configure the self-hosted runner -* Go to your repository's **Settings > Actions**, and under the **Runners** section -* Click on **Add Runner** or view existing self-hosted runners. + In your repository, go to **Settings** → **Actions** → **Runners** and select **Add runner**, or view existing self-hosted runners. -{{% notice Note %}} -If the **Actions** tab is not visible, ensure Actions are enabled by navigating to **Settings > Actions > General**, and select **Allow all actions and reusable workflows**. -{{% /notice %}} + {{% notice Note %}} + If the **Actions** tab is not visible, enable Actions under **Settings** → **Actions** → **General** by selecting **Allow all actions and reusable workflows**. + {{% /notice %}} -![runner](./images/newsh-runner.png) + ![runner alt-text#center](./images/newsh-runner.png "Screenshot of repository Runners settings page") -Then, click on the **New self-hosted runner** button. In the **Add new self-hosted runner** section. Select Linux for the operating system, and choose ARM64 for the architecture. This will generate commands to set up the runner. Copy and run them on your Google Axion C4A virtual machine. + Click **New self-hosted runner**. In the setup panel, choose `Linux` as the operating system and `ARM64` as the architecture. Copy the generated setup commands and run them on your C4A VM. + + ![new-runner alt-text#center](./images/new-runner.png "Screenshot of the Add new self-hosted runner panel") -![new-runner](./images/new-runner.png) + The final command links the runner to your GitHub repository using a one-time registration token. + During setup, you will be prompted for the runner group, runner name, and work folder. Press **Enter** at each prompt to accept the defaults. The output should look similar to: -The final command links the runner to your GitHub repo using a one-time registration token. + ```output + -------------------------------------------------------------------------------- + | ____ _ _ _ _ _ _ _ _ | + | / ___(_) |_| | | |_ _| |__ / \ ___| |_(_) ___ _ __ ___ | + | | | _| | __| |_| | | | | '_ \ / _ \ / __| __| |/ _ \| '_ \/ __| | + | | |_| | | |_| _ | |_| | |_) | / ___ \ (__| |_| | (_) | | | \__ \ | + | \____|_|\__|_| |_|\__,_|_.__/ /_/ \_\___|\__|_|\___/|_| |_|___/ | + | | + | Self-hosted runner registration | + | | + -------------------------------------------------------------------------------- -During the command’s execution, you will be prompted to provide the runner group, the name of the runner, and the work folder name. You can accept the defaults by pressing **Enter** at each step. The output will resemble as below: + # Authentication -```output --------------------------------------------------------------------------------- -| ____ _ _ _ _ _ _ _ _ | -| / ___(_) |_| | | |_ _| |__ / \ ___| |_(_) ___ _ __ ___ | -| | | _| | __| |_| | | | | '_ \ / _ \ / __| __| |/ _ \| '_ \/ __| | -| | |_| | | |_| _ | |_| | |_) | / ___ \ (__| |_| | (_) | | | \__ \ | -| \____|_|\__|_| |_|\__,_|_.__/ /_/ \_\___|\__|_|\___/|_| |_|___/ | -| | -| Self-hosted runner registration | -| | --------------------------------------------------------------------------------- - -# Authentication - -√ Connected to GitHub -# Runner Registration -Enter the name of the runner group to add this runner to: [press Enter for Default] -Enter the name of runner: [press Enter for lpprojectubuntuarm64] -This runner will have the following labels: 'self-hosted', 'Linux', 'ARM64' -Enter any additional labels (ex. label-1,label-2): [press Enter to skip] -√ Runner successfully added -√ Runner connection is good -``` + √ Connected to GitHub + # Runner Registration + Enter the name of the runner group to add this runner to: [press Enter for Default] + Enter the name of runner: [press Enter for lpprojectubuntuarm64] + This runner will have the following labels: 'self-hosted', 'Linux', 'ARM64' + Enter any additional labels (ex. label-1,label-2): [press Enter to skip] + √ Runner successfully added + √ Runner connection is good + ``` -Finally, start the runner by executing: -```console -./run.sh -``` -You should see an output similar to: + Finally, start the runner by executing: + ```console + ./run.sh + ``` + You should see an output similar to: -```output -√ Connected to GitHub + ```output + √ Connected to GitHub -Current runner version: '2.326.0' -2025-07-15 05:51:13Z: Listening for Jobs -``` -The runner will now be visible in the GitHub actions: + Current runner version: '2.326.0' + 2025-07-15 05:51:13Z: Listening for Jobs + ``` + The runner will now be visible in the GitHub actions: + + ![final-runner alt-text#center](./images/final-runner.png "Screenshot of runner visible in GitHub") -![final-runner](./images/final-runner.png) + For now, you can terminate the `./run.sh` command with `Ctrl+C`. Move on to the next section to set up a simple web server using the runner. -For now, you can terminate the `./run.sh` command with `Ctrl+C`. Move on to the next section to set up a simple web server using the runner. diff --git a/content/learning-paths/servers-and-cloud-computing/github-on-arm/instance.md b/content/learning-paths/servers-and-cloud-computing/github-on-arm/instance.md index b775ebc4f6..753d92cbca 100644 --- a/content/learning-paths/servers-and-cloud-computing/github-on-arm/instance.md +++ b/content/learning-paths/servers-and-cloud-computing/github-on-arm/instance.md @@ -6,28 +6,30 @@ weight: 3 layout: learningpathall --- -## Introduction +## Overview -This guide walks you through provisioning **Google Axion C4A Arm virtual machine** on GCP with the **c4a-standard-4 (4 vCPUs, 16 GB Memory)** machine type, using the **Google Cloud Console**. +This section walks you through creating a Google Axion C4A Arm virtual machine on Google Cloud with the `c4a-standard-4` (4 vCPUs, 16 GB memory) machine type using the Google Cloud Console. You will use this VM later as the host for a GitHub Actions self-hosted runner. -If you haven't got a Google Cloud account, you can follow the Learning Path on [Getting Started with Google Cloud Platform](https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/google/) to get started. +If you don't have a Google Cloud account, see the Learning Path [Getting started with Google Cloud Platform](https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/google/). -### Create an Arm-based Virtual Machine (C4A) +## Create an Arm-based virtual machine (C4A) -To create a virtual machine based on the C4A Arm architecture: -1. Open the [Google Cloud Console](https://console.cloud.google.com/). -2. Navigate to the card **Compute Engine** and click on **Create Instance**. -3. Under the **Machine Configuration**: - - Fill in basic details like **Instance Name**, **Region**, and **Zone**. - - Choose the **Series** as `C4A`. - - Select a machine type such as `c4a-standard-4`. -![Instance Screenshot](./images/select-instance.png) -4. Under the **OS and Storage**, click on **Change**, pick **Ubuntu** as the Operating System with **Ubuntu 24.04 LTS Minimal** as the Version. Make sure you pick the version of image for Arm. -5. Under **Networking**, enable **Allow HTTP traffic** to test workloads like NGINX later. -6. Click on **Create**, and the instance will launch. +Follow these steps in the Google Cloud Console: + +- Open the [Google Cloud Console](https://console.cloud.google.com/). +- Go to **Navigation menu ▸ Compute Engine ▸ VM instances**, then select **Create instance**. +- Under **Machine configuration**: + - Enter **Instance name**, **Region**, and **Zone** + - Set **Series** to `C4A` + - Choose a machine type such as `c4a-standard-4` +- Under **OS and storage**, select **Change**, pick **Ubuntu** as the operating system, and choose **Ubuntu 24.04 LTS Minimal**. Make sure you select the Arm image variant. +- Under **Networking**, enable **Allow HTTP traffic** so you can test workloads like NGINX later. +- Select **Create** to launch the instance. + +![Google Cloud Console page showing C4A VM creation with c4a-standard-4 selectedalt-text#center](./images/select-instance.png "Create a C4A VM in the Google Cloud Console") {{% notice Important %}} -You should not enable the **Allow HTTP traffic** permanently, since this poses a security risk. For the long-term, you should only allow traffic from the IP address you use to connect to the instance. +Do not leave **Allow HTTP traffic** enabled permanently. For long-term use, allow traffic only from the IP addresses you use to connect to the instance. {{% /notice %}} -You can access the Google Cloud Console by clicking the **SSH** button in the instance overview. Use this command line interface (CLI) to run the commands in the remainder of this Learning Path. +Access the VM from the instance list by selecting **SSH** in the instance overview. Use this command line interface (CLI) to run the commands in the remainder of this Learning Path. diff --git a/content/learning-paths/servers-and-cloud-computing/github-on-arm/nginx-deployment.md b/content/learning-paths/servers-and-cloud-computing/github-on-arm/nginx-deployment.md index 8569893dfc..bd0b3c28ad 100644 --- a/content/learning-paths/servers-and-cloud-computing/github-on-arm/nginx-deployment.md +++ b/content/learning-paths/servers-and-cloud-computing/github-on-arm/nginx-deployment.md @@ -1,29 +1,30 @@ --- -title: Deploy NGINX the GitHub Runner +title: Deploy NGINX with the GitHub runner weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall --- +## Overview This workflow installs and starts a basic NGINX web server on a self-hosted runner whenever code is pushed to the main branch. -In your instance's console, create a directory for the repository: +In your instance console, create a directory for the repository: ```console mkdir test-repo && cd test-repo echo "# test-repo" >> README.md ``` -Then, create the GitHub Actions workflow file at `.github/workflows/deploy-nginx.yaml`. +Create the GitHub Actions workflow file at `.github/workflows/deploy-nginx.yaml`: ```console mkdir .github && mkdir .github/workflows/ vim .github/workflows/deploy-nginx.yaml ``` -Paste the following code block into the file and save it. +Paste the following content into the file: ```yaml name: Deploy NGINX @@ -45,7 +46,7 @@ jobs: run: sudo systemctl start nginx ``` -Now it's time to initiate your repository and push the changes. +Initialize your repository and push the changes: ```console git init @@ -63,15 +64,15 @@ cd .. ./run.sh ``` -You will see in the output of the command that it identifies the a job called `deploy`, and that it finishes after having run the two steps. +The output shows a job called `deploy` and confirms that both steps ran successfully. -### Access the NGINX Server +## Access the NGINX server Once the workflow completes, open your browser and navigate to your machine's external IP address. You will find the information in your instance overview, under **Network interfaces**. ``` http:// ``` You should see the NGINX welcome page confirming a successful deployment. -![nginx](./images/nginx.png) +![nginx alt-text#center](./images/nginx.png "Screenshot of the NGINX welcome page in a browser") -You should now know how to set up a self-hosted runner with an Arm-based Google Cloud instance, and use it to run GitHub Actions workflows. From here, you can modify the workflow file to try out different commands. +You now know how to set up a self-hosted runner with an Arm-based Google Cloud instance, and use it to run GitHub Actions workflows. From here, you can modify the workflow file to try out different commands. diff --git a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/_index.md b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/_index.md index 37e306524b..27a3011c93 100644 --- a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/_index.md @@ -1,25 +1,22 @@ --- -title: Deploy MongoDB on Google Axion C4A virtual machine +title: Deploy MongoDB on an Arm-based Google Axion C4A VM -minutes_to_complete: 60 +minutes_to_complete: 15 -who_is_this_for: This Learning Path is designed for software developers looking to migrate their MongoDB workloads from x86_64 to Arm-based platforms, specifically on Google Axion-based C4A virtual machines. +who_is_this_for: This introductory topic is for software developers who want to migrate MongoDB workloads from x86_64 to Arm-based platforms, specifically on Google Axion-based C4A virtual machines. learning_objectives: - - Provision an Arm virtual machine on the Google Cloud Platform using the C4A Google Axion instance family, and RHEL 9 as the base image. - - Install and run MongoDB on an Arm-based GCP C4A instances. - - Validate the functionality of MongoDB through baseline testing. - - Benchmark the MongoDB performance on Arm using Yahoo Cloud Serving Benchmark (YCSB). + - Create an Arm virtual machine on Google Cloud (C4A Axion family) + - Install and run MongoDB on the Arm-based C4A instance + - Benchmark MongoDB performance with Yahoo Cloud Serving Benchmark (YCSB) prerequisites: - - A [Google Cloud Platform (GCP)](https://cloud.google.com/free?utm_source=google&hl=en) account with billing enabled. - - Basic understanding of Linux command line. - - Familiarity with the [MongoDB architecture](https://www.mongodb.com/) and deployment practices on Arm64 platforms. + - A [Google Cloud Platform (GCP)](https://cloud.google.com/free?utm_source=google&hl=en) account with billing enabled -author: Jason Andrews +author: Annie Tallund ##### Tags -skilllevels: Advanced +skilllevels: Introductory subjects: Databases cloud_service_providers: Google Cloud diff --git a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/background.md b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/background.md index 99947c23e8..e3c0cbb163 100644 --- a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/background.md +++ b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/background.md @@ -8,15 +8,14 @@ layout: "learningpathall" ## Google Axion C4A series -The Google Axion C4A series is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machine offer strong performance ideal for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications. +The C4A series is a family of Arm-based instance types for Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high performance and energy-efficient computing, these virtual machine offer strong performance suitable for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications. -The C4A series provides a cost-effective alternative to x86 virtual machine while leveraging the scalability and performance benefits of the Arm architecture in Google Cloud. +The C4A series provides a cost-effective virtual machine while leveraging the scalability and performance benefits of the Arm architecture in Google Cloud. -To learn more about Google Axion, refer to the blog [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu). +To learn more about Google Axion, see the blog post [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu). ## MongoDB -MongoDB is a popular open-source NoSQL database designed for high performance, scalability, and flexibility. -It stores data in JSON-like BSON documents, making it ideal for modern applications that require dynamic, schema-less data structures. +MongoDB is a popular open-source NoSQL database designed for performance, scalability, and flexibility. It stores data in JSON-like BSON documents, making it well suited to applications that require dynamic, schema-less data models. -MongoDB is widely used for web, mobile, IoT, and real-time analytics workloads. Learn more from the [MongoDB official website](https://www.mongodb.com/) and its [official documentation](https://www.mongodb.com/docs/). +MongoDB is widely used for web, mobile, IoT, and real-time analytics workloads. Learn more on the [MongoDB website](https://www.mongodb.com/) and in the [MongoDB documentation](https://www.mongodb.com/docs/). diff --git a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/baseline-testing.md b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/baseline-testing.md index 842709484f..4e99c74930 100644 --- a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/baseline-testing.md +++ b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/baseline-testing.md @@ -6,35 +6,42 @@ weight: 5 layout: learningpathall --- +## Overview -Since MongoDB is installed successfully on your GCP C4A Arm virtual machine, follow these steps to validate that the server is running and accepting local connections. +Now that MongoDB is installed on your Google Axion C4A Arm VM, verify that the server is running and accepting local connections. -## MongoDB Baseline Testing (Using **mongosh**) +Use mongosh to create a test database, run basic CRUD operations, and capture a quick insert-time baseline before you start benchmarking. -1. Connect to MongoDB +## Connect to MongoDB Open a shell session to the local MongoDB instance: + ```console mongosh mongodb://127.0.0.1:27017 ``` -2. Create a Test Database and Collection: -```console +## Create a test database and collection + +Switch to a new database and create a collection: + +```javascript use baselineDB db.createCollection("test") ``` -This creates a new database **baselineDB** and an empty collection named test. -You should see an output similar to: +This creates a new database named `baselineDB` and an empty collection called `test`. + +Expected output: ```output -test> use baselineDB -... db.createCollection("test") -... switched to db baselineDB +{ ok: 1 } ``` -3. Insert 10,000 Test Documents: + +## Insert 10,000 test documents + +Populate the collection with 10,000 timestamped documents: ```javascript for (let i = 0; i < 10000; i++) { @@ -45,128 +52,91 @@ for (let i = 0; i < 10000; i++) { }) } ``` -This simulates basic write operations with timestamped records. -10,000 documents will be cretaed and inserted into the test collection of the currently selected database. -The record field would increment from 0 to 9999. The status is always "new". -The timestamp would capture the insertion time for each document using ***new Date()***. -You should see an output similar to: +Each document contains: +- `record`: a counter from 0 to 9999 +- `status`: `"new"` +- `timestamp`: the current date/time of insertion + +Sample output: ```output -{ - acknowledged: true, - insertedId: ObjectId('6892dacfbd44e23df4750aa9') -} +{ acknowledged: true, insertedId: ObjectId('...') } ``` -4. Read (Query) a Subset of Documents: +## Read a subset of documents + +Verify read functionality by querying the first few documents: -Fetch a few documents to verify read functionality. ```javascript db.test.find({ status: "new" }).limit(5) ``` -This command is a simple read operation to verify that your data is inserted correctly. It queries the test collection in the current database, and only returns documents where the status is "new". ***limit(5)*** returns only the first 5 matching documents. -You should see an output similar to: +This returns the first 5 documents where `status` is `"new"`. + +## Update a document + +Update a specific document by changing its status: -```output -[ - { - _id: ObjectId('6892dacbbd44e23df474e39a'), - record: 0, - status: 'new', - timestamp: ISODate('2025-08-06T04:32:11.090Z') - }, - { - _id: ObjectId('6892dacbbd44e23df474e39b'), - record: 1, - status: 'new', - timestamp: ISODate('2025-08-06T04:32:11.101Z') - }, - { - _id: ObjectId('6892dacbbd44e23df474e39c'), - record: 2, - status: 'new', - timestamp: ISODate('2025-08-06T04:32:11.103Z') - }, - { - _id: ObjectId('6892dacbbd44e23df474e39d'), - record: 3, - status: 'new', - timestamp: ISODate('2025-08-06T04:32:11.104Z') - }, - { - _id: ObjectId('6892dacbbd44e23df474e39e'), - record: 4, - status: 'new', - timestamp: ISODate('2025-08-06T04:32:11.106Z') - } -] -``` -5. Update a Document: - -Update a specific document's field to validate update capability. ```javascript db.test.updateOne({ record: 100 }, { $set: { status: "processed" } }) ``` -Above command will find the first document where record is exactly 100, and updates that document by setting its status field to "processed". -You should see an output similar to: +This finds the document where `record` is 100 and updates the `status`. + +Expected output: ```output { acknowledged: true, - insertedId: null, matchedCount: 1, - modifiedCount: 1, - upsertedCount: 0 + modifiedCount: 1 } ``` -6. View the Updated Document Before Deletion -```console +## View the updated document + +Confirm that the document was updated: + +```javascript db.test.findOne({ record: 100 }) ``` -This retrieves the document where record is 100, allowing you to verify that its status has been updated to "processed". -You should see output similar to: +Expected output: ```output { - _id: ObjectId('689490ddb7235c65ca74e3fe'), + _id: ObjectId('...'), record: 100, status: 'processed', - timestamp: ISODate('2025-08-07T11:41:17.508Z') + timestamp: ISODate('...') } ``` -7. Delete a Document: +## Delete a document + +The command below tells MongoDB to delete one document from the test collection, where record is exactly 100: ```javascript db.test.deleteOne({ record: 100 }) ``` -This tells MongoDB to delete one document from the test collection, where record is exactly 100. - -You should see an output similar to: -```output -{ acknowledged: true, deletedCount: 1 } -``` -Now, confirm the deletion: +Verify deletion: -```console +```javascript db.test.findOne({ record: 100 }) ``` -The above command confirms that the document was successfully deleted. -You should see an output similar to: +Expected output: + ```output null ``` -8. Measure Execution Time (Optional): +## Measure execution time (optional) + +Measure how long it takes to insert 10,000 documents: -The below snippet measures how long it takes to insert documents for performance insight. ```javascript var start = new Date() for (let i = 0; i < 10000; i++) { @@ -174,39 +144,44 @@ for (let i = 0; i < 10000; i++) { } print("Insert duration (ms):", new Date() - start) ``` -You should see an output similar to: + +Sample output: ```output Insert duration (ms): 4427 ``` -9. Count Total Documents: -Count total entries to confirm expected data volume. +## Count total documents + +Check the total number of documents in the collection: + ```javascript db.test.countDocuments() ``` -You should see an output similar to: + +Expected output: ```output 19999 ``` + The count **19999** reflects the total documents after inserting 10,000 initial records, adding 10,000 more (in point 8), and deleting one (record: 100). -10. Clean Up (Optional): -Deletes the **baselineDB** database and all its contents. +## Clean up (optional) + +For the sake of resetting the environment, this following command deletes the current database you are connected to in mongosh. + +Drop the `baselineDB` database to remove all test data: + ```javascript db.dropDatabase() ``` -You should see an output similar to: + +Expected output: ```output { ok: 1, dropped: 'baselineDB' } ``` -The above is a destructive command that completely deletes the current database you are connected to in mongosh. - -The above operations confirm that MongoDB is installed successfully and is functioning as expected on the GCP Arm64 environment. - -Using **mongosh**, you validated key database operations such as **insert**, **read**, **update**, **delete**, and **count**. -Now, your MongoDB instance is ready for further benchmarking and production use. +These baseline operations confirm that MongoDB is functioning properly on your GCP Arm64 environment. Using `mongosh`, you validated inserts, queries, updates, deletes, and basic performance timing. Your instance is now ready for benchmarking or application integration. \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/benchmarking.md b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/benchmarking.md index 2ddf4990c6..eff06a9a06 100644 --- a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/benchmarking.md +++ b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/benchmarking.md @@ -6,101 +6,78 @@ weight: 6 layout: learningpathall --- -## MongoDB Benchmarking with YCSB (Yahoo! Cloud Serving Benchmark) +## Benchmark MongoDB with YCSB -**YCSB (Yahoo! Cloud Serving Benchmark)** is an open-source benchmarking tool for evaluating the performance of NoSQL databases under different workloads. It supports operations like read, write, update, and scan to simulate real-world usage patterns. +YCSB (Yahoo! Cloud Serving Benchmark) is an open-source tool for evaluating NoSQL databases under various workloads. It simulates operations such as writes, updates, and scans to mimic production traffic. -### Install YCSB (Build from Source) +## Install YCSB from source + +Install build tools and clone YCSB, then build the MongoDB binding: ```console sudo dnf install -y git maven java-11-openjdk-devel git clone https://github.com/brianfrankcooper/YCSB.git cd YCSB mvn -pl site.ycsb:mongodb-binding -am clean package -``` -### Load Phase – Insert Initial Dataset -This phase inserts documents into MongoDB to simulate a typical workload. +## Load initial data +Load a starter dataset (defaults to 1,000 records) into MongoDB: ```console ./bin/ycsb load mongodb -s \ -P workloads/workloada \ -p mongodb.url=mongodb://127.0.0.1:27017/ycsb ``` -The core purpose of this phase is to prepare the database with initial records (default: 1,000) for benchmarking. -### Execute Benchmark Workload +This prepares the database for the performance test. + +## Run a mixed workload + +Run Workload A (50% reads, 50% updates) and collect metrics: -This phase performs actual read/write operations and reports performance metrics. ```console ./bin/ycsb run mongodb -s \ -P workloads/workloada \ -p mongodb.url=mongodb://127.0.0.1:27017/ycsb ``` -Workload A (from workloads/workloada) simulates a balanced read/write workload: +**Workload A** is a balanced workload: - 50% reads -- 50% updates/writes +- 50% updates -This is designed to mimic many real-world systems where reads and writes are equally important (e.g., session stores, shopping carts, etc.). -The above command measures latency and throughput of mixed read/write operations. +This simulates common real-world applications like session stores or shopping carts. - -You should see an output similar to: +Sample output: ```output -Loading workload... -Starting test. -2025-08-06 06:05:50:378 0 sec: 0 operations; est completion in 0 second -mongo client connection created with mongodb://127.0.0.1:27017/ycsb -DBWrapper: report latency for each error is false and specific error codes to track for latency are: [] -2025-08-06 06:05:50:874 0 sec: 1000 operations; 1953.12 current ops/sec; [READ: Count=534, Max=8279, Min=156, Avg=312.96, 50=261, 90=436, 99=758, 99.9=8279, 99.99=8279] [CLEANUP: Count=1, Max=4139, Min=4136, Avg=4138, 50=4139, 90=4139, 99=4139, 99.9=4139, 99.99=4139] [UPDATE: Count=466, Max=26543, Min=186, Avg=384.45, 50=296, 90=444, 99=821, 99.9=26543, 99.99=26543] -[OVERALL], RunTime(ms), 512 -[OVERALL], Throughput(ops/sec), 1953.125 -[TOTAL_GCS_G1_Young_Generation], Count, 2 -[TOTAL_GC_TIME_G1_Young_Generation], Time(ms), 3 -[TOTAL_GC_TIME_%_G1_Young_Generation], Time(%), 0.5859375 -[TOTAL_GCS_G1_Old_Generation], Count, 0 -[TOTAL_GC_TIME_G1_Old_Generation], Time(ms), 0 -[TOTAL_GC_TIME_%_G1_Old_Generation], Time(%), 0.0 -[TOTAL_GCs], Count, 2 -[TOTAL_GC_TIME], Time(ms), 3 -[TOTAL_GC_TIME_%], Time(%), 0.5859375 [READ], Operations, 534 -[READ], AverageLatency(us), 312.96067415730334 +[READ], AverageLatency(us), 312.96 [READ], MinLatency(us), 156 [READ], MaxLatency(us), 8279 -[READ], 50thPercentileLatency(us), 261 -[READ], 95thPercentileLatency(us), 524 -[READ], 99thPercentileLatency(us), 758 -[READ], Return=OK, 534 -[CLEANUP], Operations, 1 -[CLEANUP], AverageLatency(us), 4138.0 -[CLEANUP], MinLatency(us), 4136 -[CLEANUP], MaxLatency(us), 4139 -[CLEANUP], 50thPercentileLatency(us), 4139 -[CLEANUP], 95thPercentileLatency(us), 4139 -[CLEANUP], 99thPercentileLatency(us), 4139 +... [UPDATE], Operations, 466 -[UPDATE], AverageLatency(us), 384.4527896995708 +[UPDATE], AverageLatency(us), 384.45 [UPDATE], MinLatency(us), 186 [UPDATE], MaxLatency(us), 26543 -[UPDATE], 50thPercentileLatency(us), 296 -[UPDATE], 95thPercentileLatency(us), 498 -[UPDATE], 99thPercentileLatency(us), 821 -[UPDATE], Return=OK, 466 +... +[OVERALL], RunTime(ms), 512 +[OVERALL], Throughput(ops/sec), 1953.125 ``` -### YCSB Operations & Latency Metrics +## Understand YCSB metrics + +- **Operations Count**: Total operations performed for each type (for example, READ, UPDATE). +- **Average Latency (us)**: The average time to complete each operation, measured in microseconds. +- **Min Latency / Max Latency (us)**: The fastest and slowest observed times for any single operation of that type. + +With YCSB installed and benchmark results captured, you now have a baseline for MongoDB's performance under mixed workloads. + +## Benchmark summary on x86_64 -- **Operations Count**: Total number of operations performed by YCSB for each type. -- **Average Latency (us**): The average time (in microseconds) it took to complete each operation type. -- **Min Latency (us)**: The fastest (minimum) time observed for any single operation of that type. -- **Max Latency (us)**: The slowest (maximum) time recorded for any single operation of that type. +To better understand how MongoDB behaves across architectures, YCSB benchmark workloads were run on both an **x86_64 (C3 Standard)** and an **Arm64 (C4A Standard)** virtual machine, each with 4 vCPUs and 16 GB of memory, running RHEL 9. -### Benchmark summary on x86_64: -The following benchmark results are collected on a c3-standard-4 (4 vCPU, 2 core, 16 GB Memory) x86_64 environment, running RHEL 9. +Results from a c3-standard-4 instance (4 vCPUs, 16 GB RAM) on RHEL 9: | Operation | Count | Avg Latency (us) | Min Latency (us) | Max Latency (us) | 50th Percentile (us) | 95th Percentile (us) | 99th Percentile (us) | |-----------|-------|------------------|------------------|------------------|-----------------------|----------------------|-----------------------| @@ -108,8 +85,8 @@ The following benchmark results are collected on a c3-standard-4 (4 vCPU, 2 cor | UPDATE | 528 | 621.27 | 214 | 12855 | 554 | 971 | 1224 | | CLEANUP | 1 | 4702 | 4700 | 4703 | 4703 | 4703 | 4703 | -### Benchmark summary on Arm64: -The following benchmark results are collected on a c4a-standard-4 (4 vCPU, 16 GB Memory) Arm64 environment, running RHEL 9. +## Benchmark summary on Arm64 (Google Axion C4A): +Results from a c4a-standard-4 instance (4 vCPUs, 16 GB RAM) on RHEL 9: | Operation | Count | Avg Latency (us) | Min Latency (us) | Max Latency (us) | 50th Percentile (us) | 95th Percentile (us) | 99th Percentile (us) | |----------|------------------|------------------|------------------|------------------|----------------------|----------------------|----------------------| @@ -117,8 +94,12 @@ The following benchmark results are collected on a c4a-standard-4 (4 vCPU, 16 G | UPDATE | 466 | 384.45 | 186 | 26543 | 296 | 498 | 821 | | CLEANUP | 1 | 4138 | 4136 | 4139 | 4139 | 4139 | 4139 | -### **Highlights from GCP C4A Arm virtual machine** +## Highlights from the C4A Arm VM + +- Lower average latencies on Arm: ~313 µs (READ) and ~384 µs (UPDATE). + +- Stable p50–p99 latencies indicate consistent performance. + +- Occasional max-latency outliers suggest transient spikes common in mixed workloads. -- Arm results show low **average latencies**, **READ** at **313 us** and **UPDATE** at **384 us**. -- **50th** to **99th percentile** latencies remain stable, indicating consistent performance. -- **Max latency** spikes (**8279 us READ**, **26543 us UPDAT**E) suggest rare outliers. +With YCSB built and results captured, you now have a baseline for MongoDB performance on Arm-based Google Axion C4A. You can iterate on dataset size, thread counts, and workloads (A–F) to profile additional scenarios and compare cost-performance across architectures. \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/create-instance.md b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/create-instance.md index 3ca28709da..33af64d5a6 100644 --- a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/create-instance.md +++ b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/create-instance.md @@ -1,29 +1,34 @@ --- -title: Create Google Axion C4A Arm virtual machine +title: Create Google Axion instance weight: 3 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Introduction +## Overview -This guide walks you through provisioning **Google Axion C4A Arm virtual machine** on GCP with the **c4a-standard-4 (4 vCPUs, 16 GB Memory)** machine type, using the **Google Cloud Console**. +This section walks you through creating a Google Axion C4A Arm virtual machine on GCP with the `c4a-standard-4` (4 vCPUs, 16 GB Memory) machine type, using the **Google Cloud Console**. -If you are new to Google Cloud, it is recommended to follow the [GCP Quickstart Guide to Create a virtual machine](https://cloud.google.com/compute/docs/instances/create-start-instance). +If you haven't set up a Google Cloud account, see the Learning Path [Getting started with Google Cloud Platform](https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/google/). -For more details, kindly follow the Learning Path on [Getting Started with Google Cloud Platform](https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/google/). +## Create an Arm-based virtual machine (C4A) -### Create an Arm-based Virtual Machine (C4A) +To create a VM based on the C4A Arm architecture: -To create a virtual machine based on the C4A Arm architecture: -1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/). -2. Go to **Compute Engine > VM Instances** and click on **Create Instance**. -3. Under the **Machine Configuration**: - - Fill in basic details like **Instance Name**, **Region**, and **Zone**. - - Choose the **Series** as `C4A`. - - Select a machine type such as `c4a-standard-4`. -![Instance Screenshot](./image1.png) -4. Under the **OS and Storage**, click on **Change**, and select Arm64 based OS Image of your choice. For this Learning Path, we pick **Red Hat Enterprise Linux** as the Operating System with **Red Hat Enterprise Linux 9** as the Version. Make sure you pick the version of image for Arm. -5. Under **Networking**, enable **Allow HTTP traffic** to allow HTTP communications. -6. Click on **Create**, and the instance will launch. +1. Open the [Google Cloud Console](https://console.cloud.google.com/). +2. Go to **Compute Engine** and select **Create instance**. +3. In **Machine configuration**: + - Enter the **Instance name**, **Region**, and **Zone**. + - Set **Series** to `C4A`. + - Choose a machine type such as `c4a-standard-4`. + ![Screenshot of GCP Create instance page showing C4A series and c4a-standard-4 selected alt-text#center](./select-instance.png "Selecting the C4A series and c4a-standard-4 machine type") +4. In **OS and storage**, select **Change**, choose **Red Hat Enterprise Linux** as the operating system, and **Red Hat Enterprise Linux 9** as the version. Make sure you select the **Arm** image. +5. In **Networking**, enable **Allow HTTP traffic** so you can test services later in this Learning Path. +6. Select **Create** to launch the instance. + +{{% notice Important %}} +Do not leave **Allow HTTP traffic** enabled permanently. For long-term use, restrict access to only the IP addresses you need. +{{% /notice %}} + +To open a shell on the VM, select **SSH** in the instance details page. Use this terminal for the commands in the next sections, where you will install and configure MongoDB on your Axion C4A instance. diff --git a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/mongodb-deploy.md b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/mongodb-deploy.md index 0725049259..d2799361f1 100644 --- a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/mongodb-deploy.md +++ b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/mongodb-deploy.md @@ -1,119 +1,108 @@ --- -title: Install MongoDB on Google Axion C4A virtual machine +title: Install MongoDB weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- +## Overview -## Install MongoDB and mongosh on Google Axion C4A virtual machine +This section shows you how to install MongoDB and the MongoDB Shell (`mongosh`) on an Arm-based Google Axion C4A instance running Red Hat Enterprise Linux. You will download the Arm64 binaries, update your environment, and verify that the database server runs correctly. -Install MongoDB and mongosh on GCP RHEL 9 Arm64 by downloading the binaries, setting up environment paths, configuring data and log directories, and starting the server for local access and verification. +## Install system dependencies -1. Install System Dependencies +Install the required system packages: -Install required system packages to support MongoDB: ```console +sudo dnf update -y sudo dnf install -y libcurl openssl tar wget curl -``` -2. Download annd Extract MongoDB -Fetch and unpack the MongoDB binaries for Arm64: +Download and extract MongoDB + +Fetch and unpack the MongoDB Arm64 (aarch64) binaries for RHEL 9.3: + ```console wget https://fastdl.mongodb.org/linux/mongodb-linux-aarch64-rhel93-8.0.12.tgz tar -xzf mongodb-linux-aarch64-rhel93-8.0.12.tgz ls mongodb-linux-aarch64-rhel93-8.0.12/bin ``` -3. Add MongoDB to System PATH +Add the binaries to your `PATH` so they are available in every shell session: -Enable running mongod from any terminal session: ```console echo 'export PATH=~/mongodb-linux-aarch64-rhel93-8.0.12/bin:$PATH' >> ~/.bashrc source ~/.bashrc ``` -4. Create a data Directory +## Start the MongoDB server + + + +Create a data directory for MongoDB files: -Set up the database data directory: ```console mkdir -p ~/mongodb-data/db ``` +Start mongod in the foreground to verify that it launches and to view logs directly: -5. Start MongoDB Server +```console +mongod --dbpath ~/mongodb-data/db +``` + +Starting the server in the foreground allows you to see real-time logs and is useful for debugging or verifying that MongoDB starts correctly. However, this will occupy your terminal and stop the server if you close the terminal or interrupt it. + +Stop the server (for example, with **Ctrl+C**), then confirm that data files were created: -Start MongoDB in the **foreground** (without --fork) to view real-time output and ensure it starts correctly: ```console -~/mongodb-linux-aarch64-rhel93-8.0.12/bin/mongod --dbpath ~/mongodb-data/db +ls ~/mongodb-data/db/ ``` -Once confirmed it's working, you can start MongoDB in the **background** with logging: + +Example output: + +```output +collection-0-7680310461694759627.wt index-3-7680310461694759627.wt mongod.lock WiredTiger.lock +``` + +Once you’ve confirmed it’s working, you can start MongoDB in the background using the `--fork` option and redirecting logs to a file. This allows MongoDB to run continuously without tying up your terminal session. + +Start mongod in the background with logging enabled so it continues to run after you close the terminal: + ```console -./mongodb-linux-aarch64-rhel93-8.0.12/bin/mongod --dbpath ~/mongodb-data/db --logpath ~/mongodb-data/mongod.log --fork +mongod --dbpath ~/mongodb-data/db --logpath ~/mongodb-data/mongod.log --fork ``` -{{% notice Note %}}Make sure the **~/mongodb-data/db** directory exists before starting.{{% /notice %}} -6. Install mongosh +## Install MongoDB Shell (mongosh) -**mongosh** is the MongoDB Shell used to interact with your MongoDB server. It provides a modern, user-friendly CLI for running queries and database operations. +`mongosh` is the MongoDB shell used to interact with your database. Download and install mongosh for Arm64: -Download and install MongoDB’s command-line shell for Arm: ```console wget https://github.com/mongodb-js/mongosh/releases/download/v2.5.6/mongodb-mongosh-2.5.6.aarch64.rpm sudo dnf install -y ./mongodb-mongosh-2.5.6.aarch64.rpm ``` -### Verify Mongodb and mongosh Installation -Check if MongoDb and mongosh is properly installed: +Verify the installation: ```console -mongod --version mongosh --version ``` -You should see an output similar to: -```output -db version v8.0.12 -Build Info: { - "version": "8.0.12", - "gitVersion": "b60fc6875b5fb4b63cc0dbbd8dda0d6d6277921a", - "openSSLVersion": "OpenSSL 3.2.2 4 Jun 2024", - "modules": [], - "allocator": "tcmalloc-google", - "environment": { - "distmod": "rhel93", - "distarch": "aarch64", - "target_arch": "aarch64" - } -} -$ mongosh --version -2.5.6 -``` -### Connect to MongoDB via mongosh +## Connect to MongoDB with mongosh + +Connect to the local server: -Start interacting with MongoDB through its shell interface: ```console mongosh mongodb://127.0.0.1:27017 ``` -You should see an output similar to: -```output -Current Mongosh Log ID: 6891ebb158db5b705d74e399 -Connecting to: mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.5.6 -Using MongoDB: 8.0.12 -Using Mongosh: 2.5.6 - -For mongosh info see: https://www.mongodb.com/docs/mongodb-shell/ - ------- - The server generated these startup warnings when booting - 2025-08-05T07:17:45.864+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted - 2025-08-05T07:17:45.864+00:00: Soft rlimits for open file descriptors too low - 2025-08-05T07:17:45.864+00:00: For customers running the current memory allocator, we suggest changing the contents of the following sysfsFile - 2025-08-05T07:17:45.864+00:00: We suggest setting the contents of sysfsFile to 0. - 2025-08-05T07:17:45.864+00:00: Your system has glibc support for rseq built in, which is not yet supported by tcmalloc-google and has critical performance implications. Please set the environment variable GLIBC_TUNABLES=glibc.pthread.rseq=0 ------- +Sample output: + +```output +Connecting to: mongodb://127.0.0.1:27017/?directConnection=true&... +Using MongoDB: 8.0.12 +Using Mongosh: 2.5.6 +... test> ``` -MongoDB installation is complete. You can now proceed with the baseline testing. +With MongoDB and `mongosh` successfully installed and running, you’re now ready to proceed with baseline testing. diff --git a/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/select-instance.png b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/select-instance.png new file mode 100644 index 0000000000..2a65bdcde8 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/mongodb-on-gcp/select-instance.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/neoverse-rdv3-swstack/_index.md b/content/learning-paths/servers-and-cloud-computing/neoverse-rdv3-swstack/_index.md index 170db4c0df..473bd2f67e 100644 --- a/content/learning-paths/servers-and-cloud-computing/neoverse-rdv3-swstack/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/neoverse-rdv3-swstack/_index.md @@ -34,7 +34,7 @@ tools_software_languages: - C - Docker - FVP -peratingsystems: +operatingsystems: - Linux further_reading: diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md index 907cb17e68..2a6ed40924 100644 --- a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md +++ b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md @@ -6,26 +6,24 @@ weight: 2 layout: learningpathall --- +## Overview -## Overview - -There are numerous client-server and network-based workloads, with Tomcat being a typical example of such applications. Tomcat provides services via HTTP/HTTPS network requests. - -In this section, you will set up a benchmark environment using `Apache Tomcat` and `wrk2` to simulate an HTTP load and evaluate performance on an Arm-based bare metal instance. This Learning Path was tested on an AWS `c8g.metal-48xl` instance. +Tomcat is a common client–server web workload that serves HTTP/HTTPS requests. In this section, you will set up a benchmarking environment using Apache Tomcat (server) and `wrk2` (client) to generate load and measure performance on an Arm-based bare‑metal instance. This Learning Path was validated on an AWS `c8g.metal‑48xl` instance running Ubuntu 24.04. ## Set up the Tomcat benchmark server -[Apache Tomcat](https://tomcat.apache.org/) is an open-source Java Servlet container that runs Java web applications, handles HTTP requests, and serves dynamic content. It supports technologies such as Servlet, JSP, and WebSocket. + +[Apache Tomcat](https://tomcat.apache.org/) is an open‑source Java Servlet container for running Java web applications, handling HTTP requests, and serving dynamic content. It supports Servlet, JSP, and WebSocket. ## Install the Java Development Kit (JDK) -Install OpenJDK 21 on your Arm-based Ubuntu 24.04 bare-metal instance: +Install OpenJDK 21 on your Arm‑based Ubuntu 24.04 bare‑metal instance: ```bash sudo apt update sudo apt install -y openjdk-21-jdk ``` -## Install Tomcat +## Install Tomcat Download and extract Tomcat: @@ -33,31 +31,39 @@ Download and extract Tomcat: wget -c https://dlcdn.apache.org/tomcat/tomcat-11/v11.0.10/bin/apache-tomcat-11.0.10.tar.gz tar xzf apache-tomcat-11.0.10.tar.gz ``` + Alternatively, you can build Tomcat [from source](https://github.com/apache/tomcat). ## Enable access to Tomcat examples -To access the built-in examples from your local network or external IP, use a text editor to modify the `context.xml` file by updating the `RemoteAddrValve` configuration to allow all IP addresses. +To access the built‑in examples from your local network or external IP, modify the `context.xml` file and update `RemoteAddrValve` to allow your clients. + +The file is located at: -The file is at: ```bash ~/apache-tomcat-11.0.10/webapps/examples/META-INF/context.xml ``` -Replace the existing allow value as shown: +Replace the existing value: + ```xml ``` With: + ```xml ``` -Save the changes to your file. + +{{% notice Warning %}} +Allowing `.*` permits access from all IP addresses and should be used only in isolated lab environments. Restrict this setting to trusted CIDR ranges for production or shared networks. +{{% /notice %}} ## Start the Tomcat server + {{% notice Note %}} -To achieve maximum performance of Tomcat, the maximum number of file descriptors that a single process can open simultaneously should be sufficiently large. +For maximum performance, ensure the per‑process limit for open file descriptors is sufficient. {{% /notice %}} Start the server: @@ -66,7 +72,7 @@ Start the server: ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh ``` -You should see output like: +You should see output similar to: ```output Using CATALINA_BASE: /home/ubuntu/apache-tomcat-11.0.10 @@ -80,24 +86,30 @@ Tomcat started. ## Confirm server access -In your browser, open: `http://${tomcat_ip}:8080/examples`. +Replace `${tomcat_ip}` with the public or private IP address of your Arm server and open: + +``` +http://${tomcat_ip}:8080/examples +``` You should see the Tomcat welcome page and examples, as shown below: -![Screenshot of the Tomcat homepage showing version and welcome panel alt-text#center](./_images/lp-tomcat-homepage.png "Apache Tomcat homepage") +![Screenshot of the Apache Tomcat homepage showing version and welcome panel alt-text#center](./_images/lp-tomcat-homepage.png "Apache Tomcat homepage") ![Screenshot of the Tomcat examples page showing servlet and JSP demo links alt-text#center](./_images/lp-tomcat-examples.png "Apache Tomcat examples") -{{% notice Note %}}Make sure port 8080 is open in the security group of the IP address for your Arm-based Linux machine.{{% /notice%}} +{{% notice Note %}} +Ensure port **8080** is open in the security group or firewall for your Arm‑based Linux machine. +{{% /notice %}} ## Set up the benchmarking client using wrk2 [Wrk2](https://github.com/giltene/wrk2) is a high-performance HTTP benchmarking tool specialized in generating constant throughput loads and measuring latency percentiles for web services. `wrk2` is an enhanced version of `wrk` that provides accurate latency statistics under controlled request rates, ideal for performance testing of HTTP servers. {{% notice Note %}} -Currently `wrk2` is only supported on x86 machines. Run the benchmark client steps below on a bare metal `x86_64` server running Ubuntu 24.04 +Currently, `wrk2` is only supported on x86_64 machines. Run the client steps below on a bare‑metal x86_64 server running Ubuntu 24.04. {{% /notice %}} -## Install dependencies +## Install dependencies Install the required packages: @@ -111,27 +123,36 @@ sudo apt-get install -y build-essential libssl-dev git zlib1g-dev Clone the repository and compile the tool: ```bash -sudo git clone https://github.com/giltene/wrk2.git +git clone https://github.com/giltene/wrk2.git cd wrk2 -sudo make +make ``` -Move the binary to a directory in your system’s PATH: - +Move the binary to a directory in your system’s `PATH`: + ```bash sudo cp wrk /usr/local/bin ``` ## Run the benchmark + {{% notice Note %}} -To achieve maximum performance of wrk2, the maximum number of file descriptors that a single process can open simultaneously should be sufficiently large. +As with Tomcat, set a high open‑files limit to avoid hitting FD caps during the run. {{% /notice %}} -Use the following command to benchmark the HelloWorld servlet running on Tomcat: +Benchmark the `HelloWorld` servlet running on Tomcat: ```bash ulimit -n 65535 && wrk -c32 -t16 -R50000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample ``` + +**Flags explained:** + +- `-c32` — 32 open connections +- `-t16` — 16 threads +- `-R50000` — constant 50,000 requests/second +- `-d60` — run for 60 seconds + You should see output similar to: ```console diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md index b0701374ad..bea750eefd 100644 --- a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md +++ b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md @@ -6,41 +6,50 @@ weight: 3 layout: learningpathall --- -In this section you will start by setting up a baseline configuration before using advanced techniques to tune the performance of network workloads. +## Overview + +In this section, you establish a baseline configuration before applying advanced techniques to tune the performance of Tomcat-based network workloads on an Arm Neoverse bare-metal instance. {{% notice Note %}} -To achieve maximum performance, ulimit -n 65535 must be executed on both server and client! +To avoid running out of file descriptors under load, raise the file‑descriptor limit on *both* the server and the client: +```bash +ulimit -n 65535 +``` {{% /notice %}} -## Optimal baseline configuration before tuning -This includes: -- Aligning the IOMMU settings with default Ubuntu settings -- Setting up a default configuration +## Configure an optimal baseline before tuning + +This baseline includes: + +- Aligning IOMMU settings with Ubuntu defaults +- Setting a default CPU configuration - Disabling access logging -- Setting up an optimal thread count +- Setting optimal thread counts -### Align the IOMMU settings with default Ubuntu settings +## Align IOMMU settings with Ubuntu defaults {{% notice Note %}} -As you are using a custom Ubuntu distribution on AWS, you will first need to align the IOMMU settings with default Ubuntu settings for IOMMU: iommu.strict=1 and iommu.passthrough=0. +If you are using a cloud image (for example, AWS) with non-default kernel parameters, align IOMMU settings with the Ubuntu defaults: `iommu.strict=1` and `iommu.passthrough=0`. {{% /notice %}} -1. To change the IOMMU settings, use a text editor to modify the `grub` file by adding or updating the `GRUB_CMDLINE_LINUX` configuration: +Edit GRUB and add (or update) `GRUB_CMDLINE_LINUX`: ```bash -sudo vi /etc/default/grub + sudo vi /etc/default/grub ``` -Depending on what's in your `grub` file you will need to add or update `GRUB_CMDLINE_LINUX`: + +Add or update the line to include: ```bash -GRUB_CMDLINE_LINUX="iommu.strict=1 iommu.passthrough=0" + GRUB_CMDLINE_LINUX="iommu.strict=1 iommu.passthrough=0" ``` -2. Update GRUB and reboot your system to apply the default settings. +Update GRUB and reboot to apply the settings: + ```bash -sudo update-grub && sudo reboot + sudo update-grub && sudo reboot ``` -3. Verify that the default settings have been successfully applied: +Verify that the default settings have been successfully applied: ```bash sudo dmesg | grep iommu ``` @@ -51,111 +60,120 @@ You should see that under the default configuration, `iommu.strict` is enabled, ... ``` -### Establish a baseline configuration on Arm Neoverse bare-metal instances +## Establish a baseline on Arm Neoverse bare-metal instances {{% notice Note %}} -To align with typical deployment scenario of Tomcat, you will need to reserve 8 cores on your instance to be online and set all other cores to be offline +To mirror a typical Tomcat deployment and simplify tuning, keep 8 CPU cores online and set the remaining cores offline. Adjust the CPU range to match your instance. The example below assumes 192 CPUs (as on AWS `c8g.metal-48xl`). {{% /notice %}} -1. To set the remaining CPU cores on your instance to be offline, run: +Set CPUs 8–191 offline: + ```bash -for no in {8..191}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done + for no in {8..191}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done ``` -2. Use the following commands to verify that cores 0-7 are online and the remaining cores are offline. + +Confirm that CPUs `0–7` are online and the rest are offline: + ```bash -lscpu + lscpu ``` -The output should look like: + +Example output: ```output -Architecture: aarch64 - CPU op-mode(s): 64-bit - Byte Order: Little Endian -CPU(s): 192 - On-line CPU(s) list: 0-7 - Off-line CPU(s) list: 8-191 -Vendor ID: ARM - Model name: Neoverse-V2 -... + Architecture: aarch64 + CPU op-mode(s): 64-bit + Byte Order: Little Endian + CPU(s): 192 + On-line CPU(s) list: 0-7 + Off-line CPU(s) list: 8-191 + Vendor ID: ARM + Model name: Neoverse-V2 + ... ``` -3. Now shutdown and restart `Tomcat` on your Arm Neoverse bare-metal instance as shown: +Restart Tomcat on the Arm instance: + ```bash -~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null -ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh + ~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null + ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh ``` -4. On your `x86_64` bare-metal instance, run `wrk2` as shown: +From your `x86_64` benchmarking client, run `wrk2` (replace `` with the server’s IP): ```bash -ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample + ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://:8080/examples/servlets/servlet/HelloWorldExample ``` -Replace `{tomcat_ip}` in the command above with the IP address of your Arm-based instance where the Tomcat server is running. -The result with this baseline configuration should look like: +Example result: ```output - Thread Stats Avg Stdev Max +/- Stdev - Latency 16.76s 6.59s 27.56s 56.98% - Req/Sec 1.97k 165.05 2.33k 89.90% - 14680146 requests in 1.00m, 7.62GB read - Socket errors: connect 1264, read 0, write 0, timeout 1748 -Requests/sec: 244449.62 -Transfer/sec: 129.90MB + Thread Stats Avg Stdev Max +/- Stdev + Latency 16.76s 6.59s 27.56s 56.98% + Req/Sec 1.97k 165.05 2.33k 89.90% + 14680146 requests in 1.00m, 7.62GB read + Socket errors: connect 1264, read 0, write 0, timeout 1748 + Requests/sec: 244449.62 + Transfer/sec: 129.90MB ``` -### Disable Access logging -To disable access logging, use a text editor to modify the `server.xml` file by commenting out or removing the **`org.apache.catalina.valves.AccessLogValve`** configuration. +## Disable access logging + +Disabling access logs removes I/O overhead during benchmarking. + +Edit `server.xml` and comment out (or remove) the **`org.apache.catalina.valves.AccessLogValve`** block: -The file is at: ```bash -vi ~/apache-tomcat-11.0.10/conf/server.xml + vi ~/apache-tomcat-11.0.10/conf/server.xml ``` -Either comment out or delete the configuration shown at the end of the file: ```xml - + ``` -1. Shutdown and restart `Tomcat` on your Arm Neoverse bare-metal instance as shown: +Restart Tomcat: + ```bash -~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null -ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh + ~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null + ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh ``` -2. Run `wrk2` on the `x86_64` bare-metal instance: +Re-run `wrk2`: + ```bash -ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample + ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://:8080/examples/servlets/servlet/HelloWorldExample ``` -The result with access logging disabled should look like: -```bash - Thread Stats Avg Stdev Max +/- Stdev - Latency 16.16s 6.45s 28.26s 57.85% - Req/Sec 2.16k 5.91 2.17k 77.50% - 16291136 requests in 1.00m, 8.45GB read - Socket errors: connect 0, read 0, write 0, timeout 75 -Requests/sec: 271675.12 -Transfer/sec: 144.36MB +Example result: +```output + Thread Stats Avg Stdev Max +/- Stdev + Latency 16.16s 6.45s 28.26s 57.85% + Req/Sec 2.16k 5.91 2.17k 77.50% + 16291136 requests in 1.00m, 8.45GB read + Socket errors: connect 0, read 0, write 0, timeout 75 + Requests/sec: 271675.12 + Transfer/sec: 144.36MB ``` -### Set up optimal thread counts -To minimize resource contention between threads and overhead from thread context switching, the number of CPU-intensive threads in Tomcat should be aligned with the number of CPU cores. +## Set optimal thread counts + +To minimize contention and context switching, align Tomcat’s CPU‑intensive thread count with available CPU cores. + +While `wrk2` is running, identify CPU‑intensive Tomcat threads: -1. When using `wrk` to perform pressure testing on `Tomcat`, use `top` to identify the CPU-intensive threads : ```bash -top -H -p$(pgrep java) + top -H -p "$(pgrep -n java)" ``` -The output from `top` will look like: -```bash -top - 08:57:29 up 20 min, 1 user, load average: 4.17, 2.35, 1.22 -Threads: 231 total, 8 running, 223 sleeping, 0 stopped, 0 zombie -%Cpu(s): 31.7 us, 20.2 sy, 0.0 ni, 31.0 id, 0.0 wa, 0.0 hi, 17.2 si, 0.0 st -MiB Mem : 386127.8 total, 380676.0 free, 4040.7 used, 2801.1 buff/cache -MiB Swap: 0.0 total, 0.0 free, 0.0 used. 382087.0 avail Mem +Example output: +```output + top - 08:57:29 up 20 min, 1 user, load average: 4.17, 2.35, 1.22 + Threads: 231 total, 8 running, 223 sleeping, 0 stopped, 0 zombie + %Cpu(s): 31.7 us, 20.2 sy, 0.0 ni, 31.0 id, 0.0 wa, 0.0 hi, 17.2 si, 0.0 st + MiB Mem : 386127.8 total, 380676.0 free, 4040.7 used, 2801.1 buff/cache + MiB Swap: 0.0 total, 0.0 free, 0.0 used. 382087.0 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4677 ubuntu 20 0 36.0g 1.4g 24452 R 89.0 0.4 1:18.71 http-nio-8080-P @@ -186,59 +204,52 @@ MiB Swap: 0.0 total, 0.0 free, 0.0 used. 382087.0 avail Mem ... ``` -You can observe from the output that **`http-nio-8080-e`** and **`http-nio-8080-P`** threads are CPU-intensive. -As the __`http-nio-8080-P`__ thread is fixed at 1 in current version of Tomcat, and the current number of CPU cores is 8, the `http-nio-8080-e` thread count should be configured to 7. +You’ll typically see `http-nio-8080-e` and `http-nio-8080-P` threads as CPU-intensive. Because the `http-nio-8080-P` thread count is fixed at 1 (in current Tomcat releases), and you have 8 online CPU cores, set `http-nio-8080-e` to 7. -To configure the `http-nio-8080-e` thread count, use a text editor to modify the `server.xml` file and update the ` - + + ``` -With the updated setting: - +With the tuned settings: ```xml - - + + ``` -Save the changes to `server.xml`. +Restart Tomcat and re-run `wrk2`: -2. Now shutdown and restart `Tomcat` on your Arm Neoverse bare-metal instance as shown: ```bash -~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null -ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh -``` + ~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null + ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh -3. Run `wrk2` on the`x86_64` bare-metal instance: -```bash -ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample + ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://:8080/examples/servlets/servlet/HelloWorldExample ``` -The result with the optimal thread count settings should look like: +Example result: ```output - Thread Stats Avg Stdev Max +/- Stdev - Latency 10.26s 4.55s 19.81s 62.51% - Req/Sec 2.86k 89.49 3.51k 77.06% - 21458421 requests in 1.00m, 11.13GB read -Requests/sec: 357835.75 -Transfer/sec: 190.08MB + Thread Stats Avg Stdev Max +/- Stdev + Latency 10.26s 4.55s 19.81s 62.51% + Req/Sec 2.86k 89.49 3.51k 77.06% + 21458421 requests in 1.00m, 11.13GB read + Requests/sec: 357835.75 + Transfer/sec: 190.08MB ``` -Now that you have established some baseline settings you can proceed to further tuning your setup. +With a solid baseline in place, you’re ready to proceed to NIC queue tuning, NUMA locality optimization, and IOMMU exploration in the next sections. diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/3_nic-queue.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/3_nic-queue.md index 4aa434ddbd..d552fd065a 100644 --- a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/3_nic-queue.md +++ b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/3_nic-queue.md @@ -1,23 +1,22 @@ --- -title: Tuning via NIC queue count +title: Tune performance with NIC queue counts weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Tuning via NIC queue count -To further optmize your settings, you can set the NIC queue count and observe the performance uplift: +## Overview -Typically, the number of transmit/receive queues for network cards in bare-metal environments is relatively large, reaching 63 on Arm Neoverse. Each transmit/receive queue corresponds to one interrupt number. Before CPU cores are taken offline, there are sufficient cores to handle these interrupt numbers. However, when only 8 cores are retained, it results in a single core having to handle multiple interrupt numbers, thereby triggering more context switches. +After establishing a baseline, you can further improve throughput and latency by tuning your network interface controller (NIC) queue count. Many bare‑metal NICs expose a relatively large number of transmit/receive queues (multi‑queue). Each queue maps to an interrupt; when you limit the server to a small number of online CPUs (for example, 8 cores), too many interrupts can land on the same core and increase context switching. Reducing the number of queues to match available CPUs can stabilize performance on Arm Neoverse systems. -### Setting NIC queue count +## Set the NIC queue count -1. Use the following command to find the NIC name corresponding to he IP address. +Use the following command to find the NIC name corresponding to the IP address: ```bash ip addr ``` -From the output you can see that the NIC name `enp1s0f0np0` corresponds to the IP address `10.169.226.181`. +From the output you can see that the NIC name `enp1s0f0np0` corresponds to the IP address `10.169.226.181`: ```output 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 @@ -33,16 +32,16 @@ From the output you can see that the NIC name `enp1s0f0np0` corresponds to the I valid_lft forever preferred_lft forever ``` -2. Set the network interface name variable +Set the network interface name variable: ```bash net=enp1s0f0np0 ``` -3. Use the following command to check the current transmit/receive queues of the ${net} network interface +Use the following command to check the current transmit/receive queues of the ${net} network interface: ```bash sudo ethtool -l ${net} ``` -It can be observed that the number of transmit/receive queues for the ${net} network interface is currently 63. +You can see that the number of transmit/receive queues for the ${net} network interface is currently 63: ```bash Channel parameters for enP11p4s0: Pre-set maximums: @@ -57,48 +56,56 @@ Other: n/a Combined: 32 ``` -4. Use the following command to reset the number of transmit/receive queues for the ${net} to match the number of CPUs, which is 8. -```bash -sudo ethtool -L ${net} combined 8 -``` -5. Verify whether the settings have been successfully applied. -```bash -sudo ethtool -l ${net} -``` -You should see that the number of combined Rx/Tx queues has been updated to 8. -```output -Channel parameters for enP11p4s0: -Pre-set maximums: -RX: n/a -TX: n/a -Other: n/a -Combined: 32 -Current hardware settings: -RX: n/a -TX: n/a -Other: n/a -Combined: 8 -``` +Set the number of combined queues to match your online CPUs (example: 8): + ```bash + sudo ethtool -L ${NET_IFACE} combined 8 + ``` -### The performance uplift after tuning NIC queue count +Verify the change: + ```bash + sudo ethtool -l ${NET_IFACE} + ``` -1. Shutdown and restart `Tomcat` on your Arm Neoverse bare-metal instance as shown: -```bash -~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null -ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh -``` + You should see the `Combined` value updated to `8` under **Current hardware settings**: + ```output + Channel parameters for enP11p4s0: + Pre-set maximums: + RX: n/a + TX: n/a + Other: n/a + Combined: 32 + Current hardware settings: + RX: n/a + TX: n/a + Other: n/a + Combined: 8 + ``` -2. Run `wrk2` on your `x86_64` bare-metal instance: -```bash -ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample -``` +{{% notice Note %}} +Queue settings applied with `ethtool -L` are not persistent across reboots. If needed, make the change persistent using a systemd unit or your distro’s NIC configuration mechanism. +{{% /notice %}} -Notice the performance uplift after tuning the NIC queue count: -```output - Thread Stats Avg Stdev Max +/- Stdev - Latency 8.35s 4.14s 16.33s 61.16% - Req/Sec 2.96k 73.02 3.24k 89.16% - 22712999 requests in 1.00m, 11.78GB read -Requests/sec: 378782.37 -Transfer/sec: 201.21MB -``` +## Measure performance after tuning + +Restart Tomcat on your Arm Neoverse bare‑metal instance: + ```bash + ~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null + ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh + ``` + +Run `wrk2` from your `x86_64` bare‑metal client (replace `${tomcat_ip}` with your server IP): + ```bash + ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample + ``` + + Example results after tuning NIC queues: + ```output + Thread Stats Avg Stdev Max +/- Stdev + Latency 8.35s 4.14s 16.33s 61.16% + Req/Sec 2.96k 73.02 3.24k 89.16% + 22712999 requests in 1.00m, 11.78GB read + Requests/sec: 378782.37 + Transfer/sec: 201.21MB + ``` + +If performance improves and latency stabilizes, keep the adjusted queue count for subsequent tuning steps (RSS/RPS/XPS, IRQ affinity, and NUMA locality). diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/4_local-numa.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/4_local-numa.md index de7a370226..ce2fe4b622 100644 --- a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/4_local-numa.md +++ b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/4_local-numa.md @@ -1,24 +1,25 @@ --- -title: NUMA-based Tuning +title: NUMA-based tuning weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## Tuning via local (Non-Uniform Memory Access) NUMA -In this section you will learn how to configure local NUMA and assess performance uplift acheieved through tuning. +## Tune with local NUMA (Non-Uniform Memory Access) -Typically, cross-NUMA data transfers within the CPU incur higher latency than intra-NUMA transfers. Therefore, Tomcat should be deployed on the NUMA node where the network interface resides. +In this section, you configure local NUMA and assess the performance uplift achieved through tuning. Cross‑NUMA data transfers generally incur higher latency than intra‑NUMA transfers, so Tomcat should be deployed on the NUMA node where the network interface resides to reduce cross‑node memory traffic and improve throughput and latency. -### Setting local NUMA +## Configure local NUMA + +Check NUMA topology and relative latencies: -1. Use the following command to check that the latency for cross-NUMA transfers is greater than the latency for intra-NUMA transfers: ```bash numactl -H ``` -It can be observed that the cross-NUMA latency to intra-NUMA latency ratio is 10:100. +You should see that cross‑NUMA latency is higher than intra‑NUMA latency (for example, 10:100): + ```output available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 @@ -33,28 +34,34 @@ node 0 1 1: 100 10 ``` -2. Use the following command to check the NUMA node where the ${net} network interface resides. +Identify the NUMA node for your network interface (using the `${net}` variable defined earlier): + ```bash cat /sys/class/net/${net}/device/numa_node ``` -You will notice that the NUMA node where the ${net} network interface resides is 1. + +Example output (NIC on NUMA node 1): + ```output 1 ``` -3. As a result, allocate the reserved 8 cores to NUMA node 1. +Allocate the eight reserved cores to the NIC’s NUMA node (the example below keeps CPUs 96–103 online on node 1 and offlines the rest): + ```bash for no in {96..103}; do sudo bash -c "echo 1 > /sys/devices/system/cpu/cpu${no}/online"; done for no in {0..95} {104..191}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done ``` -4. Verify whether the settings have been successfully applied. +Verify CPU online status and NUMA placement: + ```bash lscpu ``` -From the output you will see that the only online CPUs are 72-79 on NUMA node 1. -```bash +You should see only CPUs 96–103 online on NUMA node 1: + +```output Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian @@ -71,20 +78,23 @@ NUMA: ... ``` -### The result after tuning local NUMA +## Validate performance after NUMA tuning + +Restart Tomcat on the Arm Neoverse bare‑metal instance: -1. Shutdown and start Tomcat on the Arm Neoverse bare-metal instance: ```bash ~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh ``` -2. Run `wrk2` on the `x86_64` bare-metal instance: +Run `wrk2` on the `x86_64` bare‑metal client to measure throughput and latency: + ```bash ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample ``` -The result after configuring NUMA node should look like: +Sample results after placing Tomcat on the NIC’s local NUMA node: + ```output Thread Stats Avg Stdev Max +/- Stdev Latency 9.41s 4.71s 18.02s 61.07% diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/5_iommu.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/5_iommu.md index af6b695888..4eab3abdb3 100644 --- a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/5_iommu.md +++ b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/5_iommu.md @@ -6,28 +6,30 @@ weight: 6 layout: learningpathall --- -## IOMMU-based tuning -IOMMU (Input-Output Memory Management Unit) is a hardware feature that manages how I/O devices access memory. -In cloud environments, SmartNICs are typically used to offload the IOMMU workload. On bare-metal systems, to align performance with the cloud, you should disable `iommu.strict` and enable `iommu.passthrough` settings to achieve better performance. +## Tune with IOMMU -### Setting IOMMU +IOMMU (Input–Output Memory Management Unit) controls how I/O devices access memory. In many cloud environments, SmartNICs offload IOMMU-related work. On Arm Neoverse bare‑metal systems, you can often improve Tomcat networking performance by disabling strict mode and enabling passthrough (setting `iommu.strict=0` and `iommu.passthrough=1`). -1. To configure the IOMMU setting, use a text editor to modify the `grub` file by adding or updating the `GRUB_CMDLINE_LINUX` configuration. +## Configure IOMMU settings + +Edit the GRUB configuration to set IOMMU to passthrough and disable strict invalidations: ```bash sudo vi /etc/default/grub ``` -then add or update: +Add or update the kernel command line: ```bash GRUB_CMDLINE_LINUX="iommu.strict=0 iommu.passthrough=1" ``` -2. Update GRUB and reboot to apply the settings. +Update GRUB and reboot to apply the settings: + ```bash sudo update-grub && sudo reboot ``` -3. Verify if the settings have been successfully applied: +Verify that IOMMU is in passthrough mode after reboot: + ```bash sudo dmesg | grep iommu ``` @@ -38,24 +40,30 @@ You will notice that the IOMMU is already in passthrough mode: [ 0.855658] iommu: Default domain type: Passthrough (set via kernel command line) ``` -### The result after configuring IOMMU +## Validate performance after IOMMU tuning + +Prepare the Arm Neoverse bare‑metal server (ensure your `${net}` interface variable is set; if not, set it to your NIC name, for example `net=enP11p4s0`), align queues, and restart Tomcat: -1. Run the following command on the Arm Neoverse bare-metal where `Tomcat` is on: ```bash for no in {96..103}; do sudo bash -c "echo 1 > /sys/devices/system/cpu/cpu${no}/online"; done for no in {0..95} {104..191}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done -net=$(ls /sys/class/net/ | grep 'en') + +# Ensure NIC queue count matches the number of online CPUs (example: 8) sudo ethtool -L ${net} combined 8 + +# Restart Tomcat with a higher file‑descriptor limit ~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh ``` -2. Run run `wrk2` on the `x86_64` bare-metal instance as shown: +Run `wrk2` on the `x86_64` benchmarking client to measure throughput and latency: + ```bash ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample ``` -The result after iommu tuning should look like: +Sample results after IOMMU tuning: + ```output Thread Stats Avg Stdev Max +/- Stdev Latency 4.92s 2.49s 10.08s 62.27% diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/6_summary.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/6_summary.md index 0a5decc13c..8d742be719 100644 --- a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/6_summary.md +++ b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/6_summary.md @@ -6,15 +6,26 @@ weight: 7 layout: learningpathall --- -## Summary -You will observe that each tuning method can bring significant performance improvements while running Tomcat as shown in the results summary below: +## Review the results: Tomcat performance tuning on Arm Neoverse -| Method | Requests/sec | Latency-Avg | -|:----------------|:-------------|:------------| -| Baseline | 357835.75 | 10.26s | -| NIC-Queue | 378782.37 | 8.35s | -| NUMA-Local | 363744.39 | 9.41s | -| IOMMU | 428628.50 | 4.92s | +Each tuning technique delivered measurable gains for the Tomcat HTTP benchmark on an Arm Neoverse bare‑metal server (workload generated with **wrk2**). The table summarizes requests per second and average latency at each stage. +| Method | Requests/sec | Avg latency (s) | +|:-------------|-------------:|----------------:| +| Baseline | 357,835.75 | 10.26 | +| NIC queues | 378,782.37 | 8.35 | +| NUMA-local | 363,744.39 | 9.41 | +| IOMMU | 428,628.50 | 4.92 | -The same tuning methods can be applied as general guidance to help optimize and tune other network-based workloads. +### Key takeaways + +- **IOMMU passthrough** produced the largest throughput gain: **+19.8%** vs. baseline, with a **52.0%** drop in average latency. +- **NIC queue count alignment** improved throughput by **+5.9%** and reduced average latency by **18.6%**. +- **NUMA locality** yielded a smaller but consistent benefit: **+1.7%** throughput and **8.3%** lower average latency. +- Together, these techniques (IOMMU tuning, NIC queue optimization, and NUMA-aware placement) form a practical checklist for improving network workload performance on Arm Neoverse. + +### Next steps + +- Apply the same tuning pattern to other HTTP services and microservices (for example, NGINX, Envoy, or custom Jetty/Tomcat apps). +- Re‑evaluate queue counts, CPU pinning, and IOMMU mode as you scale cores, update kernels, or change NIC drivers/firmware. +- Track end‑to‑end SLOs (p95/p99 latency and error rates) in addition to average metrics to ensure sustained gains under real traffic. diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/_index.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/_index.md index e50b10a75b..af33b1c966 100644 --- a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/_index.md @@ -1,20 +1,20 @@ --- -title: Tune network workloads on Arm-based bare metal instances - +title: Tune network workloads on Arm-based bare-metal instances + minutes_to_complete: 60 who_is_this_for: This is an advanced topic for engineers who want to tune the performance of network workloads on Arm Neoverse-based bare-metal instances. learning_objectives: - - Set up a benchmarking environment using Tomcat and wrk2 - - Set up a baseline performance configuration before tuning - - Tune network workloads performance using NIC queues - - Tune network workloads performance with local NUMA - - Tune network workloads performance with IOMMU + - Set up Apache Tomcat and wrk2 to benchmark HTTP on an Arm Neoverse bare‑metal host + - Establish a reproducible baseline baseline (file‑descriptor limits, logging, thread counts, fixed core set) + - Tune NIC queue count to match available cores and measure impact + - Improve NUMA locality by placing Tomcat on the NIC’s NUMA node and aligning worker threads with cores + - Compare IOMMU strict mode and IOMMU passthrough mode, and select the configuration that delivers the best performance for your workload prerequisites: - - An Arm Neoverse-based bare-metal server running Ubuntu 24.04 to run Tomcat. This Learning Path was tested with an AWS c8g.metal-48xl instance - - Access to a x86_64 bare-metal server running Ubuntu 24.04 to run wrk2 + - An Arm Neoverse-based bare-metal server running Ubuntu 24.04 to run Apache Tomcat + - Access to an x86_64 bare-metal server running Ubuntu 24.04 to run `wrk2` - Basic familiarity with Java applications author: Ying Yu, Ker Liu, Rui Chang @@ -27,12 +27,10 @@ armips: tools_software_languages: - Tomcat - wrk2 - - OpenJDK-21 + - OpenJDK 21 operatingsystems: - Linux - - further_reading: - resource: title: OpenJDK Wiki @@ -44,7 +42,6 @@ further_reading: link: https://tomcat.apache.org/tomcat-11.0-doc/index.html type: documentation - ### FIXED, DO NOT MODIFY # ================================================================================ weight: 1 # _index.md always has weight of 1 to order correctly diff --git a/themes/arm-design-system-hugo-theme/layouts/_default/_markup/render-codeblock.html b/themes/arm-design-system-hugo-theme/layouts/_default/_markup/render-codeblock.html index 7fbc711f19..196a6fcdfe 100644 --- a/themes/arm-design-system-hugo-theme/layouts/_default/_markup/render-codeblock.html +++ b/themes/arm-design-system-hugo-theme/layouts/_default/_markup/render-codeblock.html @@ -6,5 +6,5 @@ ``` */}} {{partial "general-formatting/prismjs-codeblock.html" (dict "input_from" "normal" "code" .Inner "language" .Type -"line_numbers" .Attributes.line_numbers "output_lines" .Attributes.output_lines "command_line" .Attributes.command_line) -}} \ No newline at end of file +"line_numbers" .Attributes.line_numbers "output_lines" .Attributes.output_lines "command_line" .Attributes.command_line "line_start" .Attributes.line_start) +}} diff --git a/themes/arm-design-system-hugo-theme/layouts/partials/general-formatting/prismjs-codeblock.html b/themes/arm-design-system-hugo-theme/layouts/partials/general-formatting/prismjs-codeblock.html index 407a4a6f66..565846f283 100644 --- a/themes/arm-design-system-hugo-theme/layouts/partials/general-formatting/prismjs-codeblock.html +++ b/themes/arm-design-system-hugo-theme/layouts/partials/general-formatting/prismjs-codeblock.html @@ -14,6 +14,7 @@ code Str: echo('Hello world').... language Str: python line_numbers Bool: true or false + line_start Str: 123 output_lines Str: 5 or 1-4, 9 command_line Str: "root@localhost" */}} @@ -42,14 +43,24 @@ {{$output_present = true}} {{end}} -
     = 2:
+        return parts[0]
+    return None
 
 def get_category_from_path(path):
     match = re.match(r"content/learning-paths/([^/]+)/", path)