Skip to content

Commit 0f17ccd

Browse files
committed
examples : add info about hybrid sampling in batched [no ci]
1 parent 2b4c792 commit 0f17ccd

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

examples/batched/README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,17 @@ performed on the backend device, like a GPU.
5353
--backend_sampling --top-k 80 --backend_dist
5454
```
5555
The `--verbose` flag can be added to see more detailed output and also show
56-
that the backend samplers are being used.
56+
that the backend samplers are being used. The above example will perform distribution
57+
sampling on the backend device and only transfer the sampled token ids back to the host.
58+
59+
It is also possible to perform partial sampling on the backend, and then allow CPU samplers
60+
to process those results further. This is sometimes referred to as hybrid sampling.
61+
For an example of this we can remove `--backend_dist` from the above command:
62+
```bash
63+
./llama-batched \
64+
-m models/Qwen2.5-VL-3B-Instruct-Q8_0.gguf -p "Hello my name is" \
65+
-np 4 -kvu \
66+
--backend_sampling --top-k 80 -v
67+
```
68+
This will perform the top-k filtering on the backend device, and then transfer the filtered logits
69+
back to the host for sampling.

0 commit comments

Comments
 (0)