Skip to content

Commit e860f76

Browse files
committed
llama-bench : clarify benchmarked parts of the computation
1 parent 8284efc commit e860f76

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

tools/llama-bench/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,9 @@ Using the `-d <n>` option, each test can be run at a specified context depth, pr
8282

8383
For a description of the other options, see the [main example](../main/README.md).
8484

85+
> [!NOTE]
86+
> The measurements with `llama-bench` do not include the times for tokenization and for sampling.
87+
8588
## Examples
8689

8790
### Text generation with different models
@@ -131,7 +134,7 @@ $ ./llama-bench -n 0 -n 16 -p 64 -t 1,2,4,8,16,32
131134
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 16 | pp 64 | 33.52 ± 0.03 |
132135
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 16 | tg 16 | 15.32 ± 0.05 |
133136
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | pp 64 | 59.00 ± 1.11 |
134-
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg 16 | 16.41 ± 0.79 ||
137+
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg 16 | 16.41 ± 0.79 |
135138

136139
### Different numbers of layers offloaded to the GPU
137140

0 commit comments

Comments
 (0)