You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
llama_perf_sampler_print: sampling time = 56.16 ms / 517 runs ( 0.11 ms per token, 9205.18 tokens per second)
@@ -63,17 +65,17 @@ llama_perf_context_print: prompt eval time = 51.53 ms / 5 tokens ( 10
63
65
llama_perf_context_print: eval time = 10416.81 ms / 511 runs ( 20.39 ms per token, 49.06 tokens per second)
64
66
llama_perf_context_print: total time = 10670.73 ms / 516 tokens
65
67
llama_perf_context_print: graphs reused = 508
66
-
67
-
### Decode (generation): +8.74 t/s (+21.68%)
68
-
### Prompt (prefill): +11.07 t/s (+12.88%)
69
-
### Overall throughput: + 8.77 t/s (+21.64%)
68
+
```
69
+
**Decode (generation): +8.74 t/s (+21.68%)**
70
+
**Prompt (prefill): +11.07 t/s (+12.88%)**
71
+
**Overall throughput: + 8.77 t/s (+21.64%)**
70
72
71
73
72
74
## Instructions:
73
75
74
76
Build with all the normal AMX flags (unchanged from upstream); then use the new varible "--amx" in your run commands. You can use "--amx" on all excutables, including llama-bench.
0 commit comments