Barriers for thread synchronisation in Triton #6996
Unanswered
ElistratovSemyon
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! I am beginner with triton and want to write fused LSTM layer (smth like cudnnlstm). The simplest way is to make grid only by batch dimension, but it is more efficient to parallelize also by hidden dimension and use tensor cores for matmul. However, the second approach requires synchronisation between threads: each thread compute matmul between hidden states and weights, so current hidden state should be up-to-date for all threads. Adding barrier before updating hidden state in memory should solve the problem. But tl.debug_barrier seems like not working. Is there any possible way to solve this problem?
Beta Was this translation helpful? Give feedback.
All reactions