File tree Expand file tree Collapse file tree 1 file changed +14
-1
lines changed Expand file tree Collapse file tree 1 file changed +14
-1
lines changed Original file line number Diff line number Diff line change @@ -53,4 +53,17 @@ performed on the backend device, like a GPU.
5353 --backend_sampling --top-k 80 --backend_dist
5454```
5555The ` --verbose ` flag can be added to see more detailed output and also show
56- that the backend samplers are being used.
56+ that the backend samplers are being used. The above example will perform distribution
57+ sampling on the backend device and only transfer the sampled token ids back to the host.
58+
59+ It is also possible to perform partial sampling on the backend, and then allow CPU samplers
60+ to process those results further. This is sometimes referred to as hybrid sampling.
61+ For an example of this we can remove ` --backend_dist ` from the above command:
62+ ``` bash
63+ ./llama-batched \
64+ -m models/Qwen2.5-VL-3B-Instruct-Q8_0.gguf -p " Hello my name is" \
65+ -np 4 -kvu \
66+ --backend_sampling --top-k 80 -v
67+ ```
68+ This will perform the top-k filtering on the backend device, and then transfer the filtered logits
69+ back to the host for sampling.
You can’t perform that action at this time.
0 commit comments