Skip to content

Commit d1c6f11

Browse files
rgerganovJohannesGaesslerslaren
authored
doc : update documentation for --tensor-split (#15980)
* doc : update documentation for --tensor-split * Update tools/main/README.md Co-authored-by: Johannes Gäßler <[email protected]> * Update tools/main/README.md Co-authored-by: Diego Devesa <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]> Co-authored-by: Diego Devesa <[email protected]>
1 parent 6380d6a commit d1c6f11

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

tools/main/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -384,5 +384,5 @@ These options provide extra functionality and customization when running the LLa
384384
- `--verbose-prompt`: Print the prompt before generating text.
385385
- `--no-display-prompt`: Don't print prompt at generation.
386386
- `-mg i, --main-gpu i`: When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. By default GPU 0 is used.
387-
- `-ts SPLIT, --tensor-split SPLIT`: When using multiple GPUs this option controls how large tensors should be split across all GPUs. `SPLIT` is a comma-separated list of non-negative values that assigns the proportion of data that each GPU should get in order. For example, "3,2" will assign 60% of the data to GPU 0 and 40% to GPU 1. By default the data is split in proportion to VRAM but this may not be optimal for performance.
387+
- `-ts SPLIT, --tensor-split SPLIT`: When using multiple devices this option controls how tensors should be split across devices. `SPLIT` is a comma-separated list of non-negative values that assigns the proportion of data that each device should get in order. For example, "3,2" will assign 60% of the data to device 0 and 40% to device 1. By default, the data is split in proportion to VRAM, but this may not be optimal for performance. The list of the devices which are being used is printed on startup and can be different from the device list given by `--list-devices` or e.g. `nvidia-smi`.
388388
- `-hfr URL --hf-repo URL`: The url to the Hugging Face model repository. Used in conjunction with `--hf-file` or `-hff`. The model is downloaded and stored in the file provided by `-m` or `--model`. If `-m` is not provided, the model is auto-stored in the path specified by the `LLAMA_CACHE` environment variable or in an OS-specific local cache.

0 commit comments

Comments
 (0)