gpu p2p utilization #619
magikRUKKOLA
started this conversation in
General
Replies: 1 comment
-
Increase your peer max batch size when compiling and for sure it uses it. You're looking at the intel code. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there any mode of the llm inference in ik_llama.cpp that utilizes the p2p functionality between the GPUs? That would include the NVLINKs and, most importantly, the regular p2p master-slave functionality as enabled by the opensource nvidia drivers (see https://github.com/aikitoria/open-gpu-kernel-modules ).
[EDIT]:
with and without p2p functionality:
So there is about 35 GB/s free bandwidth available for the nvidia gpu users.
[EDIT]:
If I am reading the code correctly, the p2p functionality is used only at: ggml_backend_sycl_graph_compute and the ggml_sycl_set_peer_access is allowing it only if n_tokens is less than 128? Can anyone provide more info?
[EDIT2]:
Uh oh?
Beta Was this translation helpful? Give feedback.
All reactions