Retain inner-product matmul beyond loop (w/o storing to global memory) #3612
michael-swan
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a loop like so:
I would like to retain those
Z
values after this loop, perhaps by concatenating it onto a growing tensor (in shared memory or registers) or by writing to atl.tensor
of the right shape.Z = tl.zero(..) ... Z[i,:,:] = tl.dot(X, Y)
is not an option, as I've tried various incarnations of this.Z = tl.zero(..) ... Z = tl.cat(Z, tl.dot(X, Y))
is also not an option and fails for different reasons. The only thing that "works" is to doZ = tl.zero(<full-size>) ... Z += tl.dot(X, Y)
whereX
andY
are selected in an outer-product order but there are context-specific reasons I do not want to do this.Is there a standard way to carry this information forward without requiring me to perform a matmul accumulate or storing to global memory?
Beta Was this translation helpful? Give feedback.
All reactions