Replies: 1 comment
-
Pardon me, I figured out the answer to this. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I believe I have found the issue for performance on CPUs. It seems to be related to neighbor lists. Traditionally we should migrate from AoS to SoA (when optimizing for GPUs), but DeePMD seems to implement a novel way of neighbor lists:
Migrate from AoS to SoA and then compress each element of the neighbor list into a 64 bit unsigned integer with the equation (α(j) × 10^16 + rijj × 10^8 × 10^6 + j).
In your paper, you specify that this is an efficient way for GPUs. For x86 CPU optimization, how would you recommend improving neighbor list formatting?
Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions