Skip to content

Commit 7d602bf

Browse files
Updates
1 parent 423fd1e commit 7d602bf

File tree

4 files changed

+10
-10
lines changed

4 files changed

+10
-10
lines changed

content/learning-paths/laptops-and-desktops/dgx_spark_rag/2_rag_setup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ hf download intfloat/e5-base-v2 --local-dir ~/models/e5-base-v2
8080
wget https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -P ~/models/Llama-3.1-8B-gguf
8181
```
8282

83-
### Verify the e5-base-v2 model
83+
## Verify the e5-base-v2 model
8484

8585
Run a Python script to verify that the e5-base-v2 model loads correctly and can generate embeddings.
8686

content/learning-paths/laptops-and-desktops/dgx_spark_rag/2b_rag_setup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ This stage enables your RAG pipeline to retrieve the most relevant text chunks w
133133

134134
Use e5-base-v2 to encode the documents and create a FAISS vector index.
135135

136-
### Create and run the FAISS builder script
136+
## Create and run the FAISS builder script
137137

138138

139139
```bash

content/learning-paths/laptops-and-desktops/dgx_spark_rag/3_rag_pipeline.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Building upon the previous modules, you will now:
1717
- Integrate the llama.cpp REST server for GPU-accelerated inference.
1818
- Execute a complete Retrieval-Augmented Generation (RAG) workflow for end-to-end question answering.
1919

20-
### Start the llama.cpp REST server
20+
## Start the llama.cpp REST server
2121

2222
Before running the RAG query script, ensure the LLM server is active by running:
2323

@@ -185,7 +185,7 @@ This demonstrates that the RAG system correctly retrieved relevant sources and g
185185
186186
You can reference the section 5.1.2 on the PDF to verify the result.
187187
188-
### Observe CPU and GPU utilization
188+
## Observe CPU and GPU utilization
189189
190190
If you have installed `htop` and `nvtop`, you can observe CPU and GPU utilization.
191191

content/learning-paths/laptops-and-desktops/dgx_spark_rag/4_rag_memory_observation.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,11 @@ Open two terminals on your GB10 system and use them as listed in the table below
2525
You should also have your original terminals open that you used to run the `llama-server` and the RAG queries in the previous section. You will run these again and use the two new terminals for observation.
2626

2727

28-
### Prepare for unified memory observation
28+
## Prepare for unified memory observation
2929

3030
Ensure the RAG pipeline is stopped before starting the observation.
3131

32-
#### Terminal 1 - system memory observation
32+
### Terminal 1 - system memory observation
3333

3434
Run the Bash commands below in terminal 1 to print the free memory of the system:
3535

@@ -56,7 +56,7 @@ The printed fields are:
5656
- `free` — Memory not currently allocated or reserved by the system.
5757
- `available` — Memory immediately available for new processes, accounting for reclaimable cache and buffers.
5858

59-
#### Terminal 2 – GPU status observation
59+
### Terminal 2 – GPU status observation
6060

6161
Run the Bash commands below in terminal 2 to print the GPU statistics:
6262

@@ -89,7 +89,7 @@ Here is an explanation of the fields:
8989
| `memory.used` | GPU VRAM usage | GB10 does not include separate VRAM; all data resides within Unified Memory |
9090

9191

92-
### Run the llama-server
92+
## Run the llama-server
9393

9494
With the idle condition understood, start the `llama.cpp` REST server again in your original terminal, not the two new terminals being used for observation.
9595

@@ -200,7 +200,7 @@ The GPU executes compute kernels with GPU utilization at 96%, without reading fr
200200

201201
The `utilization.memory=0` and `memory.used=[N/A]` metrics are clear signs that data sharing, not data copying, is happening.
202202

203-
### Interpret unified memory behavior
203+
## Interpret unified memory behavior
204204

205205
This experiment confirms the Grace–Blackwell Unified Memory architecture in action:
206206
- The CPU and GPU share the same address space.
@@ -211,7 +211,7 @@ Data does not move — computation moves to the data.
211211

212212
The Grace CPU orchestrates retrieval, and the Blackwell GPU performs generation, both operating within the same Unified Memory pool.
213213

214-
### Summary of unified memory behavior
214+
## Summary of unified memory behavior
215215

216216
| **Observation** | **Unified Memory Explanation** |
217217
|----------------------------------------------------|----------------------------------------------------------|

0 commit comments

Comments
 (0)