Skip to content

Commit 5d17b47

Browse files
committed
fix: add example for getting the graphics clock
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
1 parent 2bf4084 commit 5d17b47

File tree

1 file changed

+23
-6
lines changed

1 file changed

+23
-6
lines changed

docs/blog/posts/2025-12-01-nvidia-gpu-setup.md

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ Now that your GPU is working, let's tune it for actual compute work. These optim
209209

210210
### Enable Persistence Mode
211211

212-
By default, the NVIDIA driver unloads when no applications are using the GPU. This saves power but adds several seconds of initialization latency when you start a new workload. Persistence mode keeps the driver loaded:
212+
By default, the NVIDIA driver unloads when no applications are using the GPU. This saves power but adds several seconds of initialization latency when you start a new workload. Persistence mode keeps the driver loaded.
213213

214214
```bash
215215
sudo nvidia-smi -pm 1
@@ -227,20 +227,37 @@ sudo nvidia-smi -c EXCLUSIVE_PROCESS
227227

228228
NVIDIA GPUs dynamically scale their clock frequencies based on load, temperature, and power limits. For benchmarking or latency-sensitive workloads, you may want consistent performance. Lock the graphics clock to maximum:
229229

230+
!!! note
231+
232+
The following values are specific for H100 GPUs. Check your specific GPU's capabilities with `nvidia-smi -q -d SUPPORTED_CLOCKS`.
233+
234+
!!! example "The output will look similar to this"
235+
236+
```bash
237+
Timestamp : Thu Dec 1 23:00:53 2025
238+
Driver Version : 580.105.08
239+
CUDA Version : 13.0
240+
241+
Attached GPUs : 4
242+
GPU 00000000:06:00.0
243+
Supported Clocks
244+
Memory : 2619 MHz
245+
Graphics : 1980 MHz
246+
...
247+
```
248+
249+
With the memory and graphics clock values identified, lock the graphics clock.
250+
230251
```bash
231252
sudo nvidia-smi -lgc 1980,1980 # H100 max graphics clock
232253
```
233254

234-
And lock the memory clock:
255+
Lock the memory clock.
235256

236257
```bash
237258
sudo nvidia-smi -lmc 2619,2619 # H100 HBM3 max
238259
```
239260

240-
!!! note
241-
242-
These specific values are for H100 GPUs. Check your specific GPU's capabilities with `nvidia-smi -q -d SUPPORTED_CLOCKS`.
243-
244261
### Disable ECC Memory
245262

246263
H100 and A30 GPUs have Error Correcting Code (ECC) memory enabled by default, which protects against bit flips but reduces available memory bandwidth by 5-10%. For workloads where you need maximum throughput and can tolerate occasional errors (training jobs with checkpointing, inference where you can retry failures), you can disable ECC.

0 commit comments

Comments
 (0)