Skip to content

Commit 0ccf0ba

Browse files
committed
add not about --tensor-split
1 parent 6f90443 commit 0ccf0ba

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

tools/rpc/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,8 @@ Finally, when running `llama-cli` or `llama-server`, use the `--rpc` option to s
8080
$ llama-cli -hf ggml-org/gemma-3-1b-it-GGUF -ngl 99 --rpc 192.168.88.10:50052,192.168.88.11:50052
8181
```
8282

83-
This way you can offload model layers to both local and remote devices.
83+
By default, the ggml scheduler distributes model weights across all available devices -- both local and remote -- in proportion to each device's available memory.
84+
You can override this behavior with the `--tensor-split` option and set custom proportions when splitting tensor data across devices.
8485

8586
### Local cache
8687

0 commit comments

Comments
 (0)