You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[RELAND] Simpler codegen for linear layouts (#4554)
The upstream commit simplifies linear layout codegen for the case when
the output of the linear layout is 1 dimensional (post-folding). The
generic algorithm loops over the bits in the requested index for each
input dimension and selects the corresponding basis value for each
non-zero bit. This requires checking to see if the bit is zero (eq) and
then grabbing the basis value if the bit is nonzero (select). The
simplified code flattens the inputs into one dimension which allows us
to evaluate the linear layout using a simple linear function `L(a) = Ba`
where `a` is input index and `B` is the matrix of basis vectors from our
flattened layout which is what the `matrixVectorProd` code is doing.
Anyway, the end result of all this is the processing of the thread ID
(the lane ID and warp ID are held constant) changes from a select to
make sure the thread ID bit value is non-zero to a series of xors and
shifts. However, the number of xors for the print, and even the
constants used, are identical. So, the nested layout encoding
propagation is still working correctly. When updating the lit test,
which I believe is designed to make sure the layout nesting is followed
properly, I chose to drop the pre-amble evaluation of the linear layout
and keep only the last two xors. Linear layout evaluation is tested
basically everywhere (including functional correctness in
`test_reduce_layouts`). Here, we focus on making sure the printfs were
generated properly based on the nested layouts. This should result in
less maintenance burden going forward as I suspect the linear layout
evaluation code will be tweaked again in the future.
close#4551
0 commit comments