Refine documentation for clarity: update section headings and descriptions in RAG setup and pipeline guides

madeline-underwood · madeline-underwood · commit a22c1d8e8942 · 2025-11-24T10:44:11.000Z
diff --git a/content/learning-paths/laptops-and-desktops/dgx_spark_rag/1_rag.md b/content/learning-paths/laptops-and-desktops/dgx_spark_rag/1_rag.md
@@ -1,5 +1,5 @@
 ---
-title: Build a RAG pipeline on Arm-based Grace–Blackwell systems
+title: Explore building a RAG pipeline on Arm-based Grace–Blackwell systems
 weight: 2
 
 ### FIXED, DO NOT MODIFY
@@ -8,14 +8,13 @@ layout: learningpathall
 
 ## Get started
 
-Before starting this Learning Path, you should complete [Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark](/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/) to learn about the CPU and GPU builds of llama.cpp. This background is recommended for building the RAG solution on llama.cpp.
+Before getting started, you should complete the Learning Path [Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark](/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/) to learn about the CPU and GPU builds of llama.cpp. This background is recommended for building the RAG solution on llama.cpp.
 
 The NVIDIA DGX Spark is also referred to as the Grace-Blackwell platform or GB10, the name of the NVIDIA Grace-Blackwell Superchip. 
 
 ## What is RAG?
 
-Retrieval-Augmented Generation (RAG) combines information retrieval with language-model generation.
-Instead of relying solely on pre-trained weights, a RAG system retrieves relevant text from a document corpus and passes it to a language model to create factual, context-aware responses.
+Retrieval-Augmented Generation (RAG) combines information retrieval with language-model generation. Instead of relying solely on pre-trained weights, a RAG system retrieves relevant text from a document corpus and passes it to a language model to create factual, context-aware responses.
 
 Here is a typical pipeline:
 
@@ -35,9 +34,9 @@ Its unique CPU–GPU design and unified memory enable seamless data exchange, ma
 
 The GB10 platform includes:
 
-- Grace CPU (Armv9.2 architecture) – 20 cores including 10 Cortex-X925 cores and 10 Cortex-A725 cores
-- Blackwell GPU – CUDA 13.0 Tensor Core architecture
-- Unified Memory (128 GB NVLink-C2C) – Shared address space between CPU and GPU which allows both processors to access the same 128 GB unified memory region without copy operations. 
+- Grace CPU (Armv9.2 architecture) - 20 cores including 10 Cortex-X925 cores and 10 Cortex-A725 cores
+- Blackwell GPU - CUDA 13.0 Tensor Core architecture
+- Unified Memory (128 GB NVLink-C2C) - Shared address space between CPU and GPU which allows both processors to access the same 128 GB unified memory region without copy operations. 
 
 The GB10 provides the following benefits for RAG applications:
 
@@ -102,7 +101,7 @@ The technology stack you will use is listed below:
 | Unified Memory Architecture | Unified LPDDR5X shared memory | Grace CPU and Blackwell GPU | Enables zero-copy data sharing between CPU and GPU for improved latency and efficiency. |
 
 
-## Prerequisites Check
+## Check your setup 
 
 Before starting, run the following commands to confirm your hardware is ready:
 
diff --git a/content/learning-paths/laptops-and-desktops/dgx_spark_rag/2_rag_setup.md b/content/learning-paths/laptops-and-desktops/dgx_spark_rag/2_rag_setup.md
@@ -84,7 +84,7 @@ wget https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/ma
 
 Run a Python script to verify that the e5-base-v2 model loads correctly and can generate embeddings.
 
-Save the code below in a text file named `vector-test.py`.
+Save the code below in a text file named `vector-test.py`:
 
 ```bash
 from sentence_transformers import SentenceTransformer
@@ -136,7 +136,7 @@ The e5-base-v2 results show:
 
 A successful output confirms that the e5-base-v2 embedding model is functional and ready for use.
 
-### Verify the Llama 3.1 model
+## Verify the Llama 3.1 model
 
 The llama.cpp runtime will be used for text generation using the Llama 3.1 model. 
 
diff --git a/content/learning-paths/laptops-and-desktops/dgx_spark_rag/3_rag_pipeline.md b/content/learning-paths/laptops-and-desktops/dgx_spark_rag/3_rag_pipeline.md
@@ -41,7 +41,7 @@ The output is:
 {"status":"ok"}
 ```
 
-### Create the RAG query script
+## Create the RAG query script
 
 This script performs the full pipeline using the flow:
 
diff --git a/content/learning-paths/laptops-and-desktops/dgx_spark_rag/_index.md b/content/learning-paths/laptops-and-desktops/dgx_spark_rag/_index.md
@@ -2,7 +2,7 @@
 title: Build a RAG pipeline on Arm-based NVIDIA DGX Spark
 minutes_to_complete: 60
 
-who_is_this_for: This is an advanced topic for developers who want to understand and implement a Retrieval-Augmented Generation (RAG) pipeline on the NVIDIA DGX Spark platform. It is ideal for those interested in exploring how Arm-based Grace CPUs manage local document retrieval and orchestration, while Blackwell GPUs accelerate large language model inference through the open-source llama.cpp REST server.
+who_is_this_for: This is an advanced topic for developers who want to build a Retrieval-Augmented Generation (RAG) pipeline on the NVIDIA DGX Spark platform. You'll learn how Arm-based Grace CPUs handle document retrieval and orchestration, while Blackwell GPUs speed up large language model inference using the open-source llama.cpp REST server. This is a great fit if you're interested in combining Arm CPU management with GPU-accelerated AI workloads.
 
 learning_objectives:
     - Describe how a RAG system combines document retrieval and language model generation
@@ -11,7 +11,7 @@ learning_objectives:
     - Build a reproducible RAG application that demonstrates efficient hybrid computing
 
 prerequisites:
-    - An NVIDIA DGX Spark system with at least 15 GB of available disk space.
+    - An NVIDIA DGX Spark system with at least 15 GB of available disk space
 
 author: Odin Shen