Skip to content
Discussion options

You must be logged in to vote

我自己好像解决这个问题了:
1.关于block C部分,可以在compute at后,直接去用get_loops去找到对应的循环变量,再进行操作;
2.关于block Y_init部分,在decompose_reduction之后,会自动生成一个Y_init的block,因此,操作同对于block_C部分的操作。
修改后的tansform部分如下:

sch = tvm.tir.Schedule(MyBmmRelu)

block_Y = sch.get_block("Y", func_name="bmm_relu")
block_C = sch.get_block("C", "bmm_relu")

b, i, j, k = sch.get_loops(block_Y)
i2_0, ax0 = sch.split(j, [16, None])
ax1_0, ax1_1 = sch.split(k, [None, 4])
sch.reorder(ax1_0, ax1_1, ax0)
sch.parallel(b)
sch.unroll(ax1_1)
sch.reverse_compute_at(block_C, i2_0)
sch.decompose_reduction(block_Y, ax1_0)

i0, i1, i2_0, i2_1 = sch.get_loops(block_C)
sch.vectorize(i2_1)

block_Y_init = sch.get_block("Y_init", "bmm_relu")

b, i, j0, j_1_init =…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by cjx0709
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant