Skip to content

Commit 6131aea

Browse files
docs: explain faster CUDA CMake compile [no ci]
1 parent 09ecbcb commit 6131aea

File tree

2 files changed

+6
-3
lines changed

2 files changed

+6
-3
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -459,14 +459,14 @@ To learn more how to measure perplexity using llama.cpp, [read this documentatio
459459
- Make sure to read this: [Inference at the edge](https://github.com/ggerganov/llama.cpp/discussions/205)
460460
- A bit of backstory for those who are interested: [Changelog podcast](https://changelog.com/podcast/532)
461461

462-
## Other documentations
462+
## Other documentation
463463

464464
- [main (cli)](./examples/main/README.md)
465465
- [server](./examples/server/README.md)
466466
- [jeopardy](./examples/jeopardy/README.md)
467467
- [GBNF grammars](./grammars/README.md)
468468

469-
**Development documentations**
469+
**Development documentation**
470470

471471
- [How to build](./docs/build.md)
472472
- [Running on Docker](./docs/docker.md)

docs/build.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,10 @@ For Jetson user, if you have Jetson Orin, you can try this: [Offical Support](ht
178178
cmake --build build --config Release
179179
```
180180
181-
The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used.
181+
By default llama.cpp will be built for a selection of CUDA architectures that enables running the code on any NVIDIA GPU supported by CUDA 12 (Maxwell or newer).
182+
However, for local use the build can be sped up by narrowing the range of supported CUDA architectures.
183+
By adding `-DCMAKE_CUDA_ARCHITECTURES=native` to the first CMake command the built CUDA architectures can be set to exactly those currently connected to the system.
184+
The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to limit which GPUs are visible (for CUDA in general).
182185
183186
The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted. In Windows this setting is available in the NVIDIA control panel as `System Memory Fallback`.
184187

0 commit comments

Comments
 (0)