Skip to content

Commit e7f1752

Browse files
Revise RAG setup documentation: update title, enhance environment setup instructions, and consolidate model preparation steps.
1 parent 038944d commit e7f1752

File tree

3 files changed

+12
-13
lines changed

3 files changed

+12
-13
lines changed

content/learning-paths/laptops-and-desktops/dgx_spark_rag/1_rag.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,25 +8,24 @@ layout: learningpathall
88

99
## Before you start
1010

11-
Complete the [Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark](/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/) Learning Path first to understand how to build and run llama.cpp on both the CPU and GPU. This foundational knowledge is essential before you begin building the RAG solution described here.
11+
Before starting this Learning Path, you should complete [Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark](/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/) to learn about the CPU and GPU builds of llama.cpp. This background is recommended for building the RAG solution on llama.cpp.
1212

13-
{{% notice Note %}}
14-
The NVIDIA DGX Spark is also called the Grace–Blackwell platform or GB10, which refers to the NVIDIA Grace–Blackwell Superchip.
15-
{{% /notice %}}
13+
The NVIDIA DGX Spark is also referred to as the Grace-Blackwell platform or GB10, the name of the NVIDIA Grace-Blackwell Superchip.
1614

1715
## What is RAG?
1816

19-
Retrieval-Augmented Generation (RAG) combines information retrieval with language-model generation. Instead of relying solely on pre-trained weights, a RAG system retrieves relevant text from a document corpus and passes it to a language model to create factual, context-aware responses.
17+
Retrieval-Augmented Generation (RAG) combines information retrieval with language-model generation.
18+
Instead of relying solely on pre-trained weights, a RAG system retrieves relevant text from a document corpus and passes it to a language model to create factual, context-aware responses.
2019

2120
Here is a typical pipeline:
2221

2322
User Query ─> Embedding ─> Vector Search ─> Context ─> Generation ─> Answer
2423

2524
Each stage in this pipeline plays a distinct role in transforming a question into a context-aware response:
2625

27-
* Embedding model: converts text into dense numerical vectors. An example is e5-base-v2.
28-
* Vector database: searches for semantically similar chunks. An example is FAISS.
29-
* Language model: generates an answer conditioned on retrieved context. An example is Llama 3.1 8B Instruct.
26+
* Embedding model: Converts text into dense numerical vectors. An example is e5-base-v2.
27+
* Vector database: Searches for semantically similar chunks. An example is FAISS.
28+
* Language model: Generates an answer conditioned on retrieved context. An example is Llama 3.1 8B Instruct.
3029

3130
## Why is Grace–Blackwell good for RAG pipelines?
3231

File renamed without changes.

content/learning-paths/laptops-and-desktops/dgx_spark_rag/_index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
2-
title: Build a RAG pipeline on NVIDIA DGX Spark
2+
title: Build a RAG pipeline on Arm-based NVIDIA DGX Spark
33
minutes_to_complete: 60
44

55
who_is_this_for: This is an advanced topic for developers who want to understand and implement a Retrieval-Augmented Generation (RAG) pipeline on the NVIDIA DGX Spark platform. It is ideal for those interested in exploring how Arm-based Grace CPUs manage local document retrieval and orchestration, while Blackwell GPUs accelerate large language model inference through the open-source llama.cpp REST server.
66

77
learning_objectives:
8-
- Describe how a RAG system combines document retrieval and language model generation.
9-
- Deploy a hybrid CPUGPU RAG pipeline on the GB10 platform using open-source tools.
10-
- Use the llama.cpp REST Server for GPU-accelerated inference with CPU-managed retrieval.
11-
- Build a reproducible RAG application that demonstrates efficient hybrid computing.
8+
- Describe how a RAG system combines document retrieval and language model generation
9+
- Deploy a hybrid CPU-GPU RAG pipeline on the GB10 platform using open-source tools
10+
- Use the llama.cpp REST Server for GPU-accelerated inference with CPU-managed retrieval
11+
- Build a reproducible RAG application that demonstrates efficient hybrid computing
1212

1313
prerequisites:
1414
- An NVIDIA DGX Spark system with at least 15 GB of available disk space.

0 commit comments

Comments
 (0)