Skip to content

Commit a22c1d8

Browse files
Refine documentation for clarity: update section headings and descriptions in RAG setup and pipeline guides
1 parent 7cdb293 commit a22c1d8

File tree

4 files changed

+12
-13
lines changed

4 files changed

+12
-13
lines changed

content/learning-paths/laptops-and-desktops/dgx_spark_rag/1_rag.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Build a RAG pipeline on Arm-based Grace–Blackwell systems
2+
title: Explore building a RAG pipeline on Arm-based Grace–Blackwell systems
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
@@ -8,14 +8,13 @@ layout: learningpathall
88

99
## Get started
1010

11-
Before starting this Learning Path, you should complete [Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark](/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/) to learn about the CPU and GPU builds of llama.cpp. This background is recommended for building the RAG solution on llama.cpp.
11+
Before getting started, you should complete the Learning Path [Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark](/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/) to learn about the CPU and GPU builds of llama.cpp. This background is recommended for building the RAG solution on llama.cpp.
1212

1313
The NVIDIA DGX Spark is also referred to as the Grace-Blackwell platform or GB10, the name of the NVIDIA Grace-Blackwell Superchip.
1414

1515
## What is RAG?
1616

17-
Retrieval-Augmented Generation (RAG) combines information retrieval with language-model generation.
18-
Instead of relying solely on pre-trained weights, a RAG system retrieves relevant text from a document corpus and passes it to a language model to create factual, context-aware responses.
17+
Retrieval-Augmented Generation (RAG) combines information retrieval with language-model generation. Instead of relying solely on pre-trained weights, a RAG system retrieves relevant text from a document corpus and passes it to a language model to create factual, context-aware responses.
1918

2019
Here is a typical pipeline:
2120

@@ -35,9 +34,9 @@ Its unique CPU–GPU design and unified memory enable seamless data exchange, ma
3534

3635
The GB10 platform includes:
3736

38-
- Grace CPU (Armv9.2 architecture) 20 cores including 10 Cortex-X925 cores and 10 Cortex-A725 cores
39-
- Blackwell GPU CUDA 13.0 Tensor Core architecture
40-
- Unified Memory (128 GB NVLink-C2C) Shared address space between CPU and GPU which allows both processors to access the same 128 GB unified memory region without copy operations.
37+
- Grace CPU (Armv9.2 architecture) - 20 cores including 10 Cortex-X925 cores and 10 Cortex-A725 cores
38+
- Blackwell GPU - CUDA 13.0 Tensor Core architecture
39+
- Unified Memory (128 GB NVLink-C2C) - Shared address space between CPU and GPU which allows both processors to access the same 128 GB unified memory region without copy operations.
4140

4241
The GB10 provides the following benefits for RAG applications:
4342

@@ -102,7 +101,7 @@ The technology stack you will use is listed below:
102101
| Unified Memory Architecture | Unified LPDDR5X shared memory | Grace CPU and Blackwell GPU | Enables zero-copy data sharing between CPU and GPU for improved latency and efficiency. |
103102

104103

105-
## Prerequisites Check
104+
## Check your setup
106105

107106
Before starting, run the following commands to confirm your hardware is ready:
108107

content/learning-paths/laptops-and-desktops/dgx_spark_rag/2_rag_setup.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ wget https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/ma
8484

8585
Run a Python script to verify that the e5-base-v2 model loads correctly and can generate embeddings.
8686

87-
Save the code below in a text file named `vector-test.py`.
87+
Save the code below in a text file named `vector-test.py`:
8888

8989
```bash
9090
from sentence_transformers import SentenceTransformer
@@ -136,7 +136,7 @@ The e5-base-v2 results show:
136136
137137
A successful output confirms that the e5-base-v2 embedding model is functional and ready for use.
138138
139-
### Verify the Llama 3.1 model
139+
## Verify the Llama 3.1 model
140140
141141
The llama.cpp runtime will be used for text generation using the Llama 3.1 model.
142142

content/learning-paths/laptops-and-desktops/dgx_spark_rag/3_rag_pipeline.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ The output is:
4141
{"status":"ok"}
4242
```
4343

44-
### Create the RAG query script
44+
## Create the RAG query script
4545

4646
This script performs the full pipeline using the flow:
4747

content/learning-paths/laptops-and-desktops/dgx_spark_rag/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Build a RAG pipeline on Arm-based NVIDIA DGX Spark
33
minutes_to_complete: 60
44

5-
who_is_this_for: This is an advanced topic for developers who want to understand and implement a Retrieval-Augmented Generation (RAG) pipeline on the NVIDIA DGX Spark platform. It is ideal for those interested in exploring how Arm-based Grace CPUs manage local document retrieval and orchestration, while Blackwell GPUs accelerate large language model inference through the open-source llama.cpp REST server.
5+
who_is_this_for: This is an advanced topic for developers who want to build a Retrieval-Augmented Generation (RAG) pipeline on the NVIDIA DGX Spark platform. You'll learn how Arm-based Grace CPUs handle document retrieval and orchestration, while Blackwell GPUs speed up large language model inference using the open-source llama.cpp REST server. This is a great fit if you're interested in combining Arm CPU management with GPU-accelerated AI workloads.
66

77
learning_objectives:
88
- Describe how a RAG system combines document retrieval and language model generation
@@ -11,7 +11,7 @@ learning_objectives:
1111
- Build a reproducible RAG application that demonstrates efficient hybrid computing
1212

1313
prerequisites:
14-
- An NVIDIA DGX Spark system with at least 15 GB of available disk space.
14+
- An NVIDIA DGX Spark system with at least 15 GB of available disk space
1515

1616
author: Odin Shen
1717

0 commit comments

Comments
 (0)