You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-18Lines changed: 19 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
-
##gadflyii/llama.cpp
1
+
# gadflyii/llama.cpp
2
2
3
3
This fork enables Intel AMX acceleration for 4th, 5th, and 6th generation Xeon / Xeon-w processors in CPU / GPU hybrids. Upstream llama.cpp will disable AMX if a GPU is detected, slowing performance on offloaded CPU layers / experts.
4
4
5
5
The default behavior for CPU only operations is unchanged. When a GPU is present, and the cli/server/bench is started with the "--amx" flag, the CPU's extra buffers are exposed and prefferred, thus enabling repack and use AMX acceleration on the CPU.
@@ -64,18 +64,18 @@ llama_perf_context_print: eval time = 10416.81 ms / 511 runs ( 20
64
64
llama_perf_context_print: total time = 10670.73 ms / 516 tokens
65
65
llama_perf_context_print: graphs reused = 508
66
66
67
-
## Decode (generation): +8.74 t/s (+21.68%)
68
-
## Prompt (prefill): +11.07 t/s (+12.88%)
69
-
## Overall throughput: + 8.77 t/s (+21.64%)
67
+
###Decode (generation): +8.74 t/s (+21.68%)
68
+
###Prompt (prefill): +11.07 t/s (+12.88%)
69
+
###Overall throughput: + 8.77 t/s (+21.64%)
70
70
71
71
72
-
# Instructions:
72
+
##Instructions:
73
73
74
74
Build with all the normal AMX flags (unchanged from upstream); then use the new varible "--amx" in your run commands. You can use "--amx" on all excutables, including llama-bench.
0 commit comments