Commit 134e694
authored
llama : skip output reordering for single token batches (ggml-org#17466)
This commit adds a check to skip the output reordering logic when
n_outputs == 1. With a single output token, the data is trivially
sorted and the reordering code is currently doing unnecessary work
(resetting and rebuilding output_ids to the same values).
The motivation for this change is improved code clarity and avoiding
confusion when debugging. While the performance impact is probably
negligible, this unnecessary work happens on every decode call in
llama-server when processing batches with single-token outputs.1 parent 0543f92 commit 134e694
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1248 | 1248 | | |
1249 | 1249 | | |
1250 | 1250 | | |
1251 | | - | |
| 1251 | + | |
1252 | 1252 | | |
1253 | 1253 | | |
1254 | 1254 | | |
| |||
0 commit comments