-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
Description
I am seeking help from Cutlass community. I encountered misaligned address (16 bytes requested) when load matrix from smem to register. I found out the problem started at "thr_copy_ldmatrix_A.partition_S(sA)".
Does anyone know how to debug and solve this issue?
--------------------------------
DEBUG: sA:
tensor<ptr<i8, smem, align<1024>> o ((64,1),(8,8),(1,4)):((1,0),(64,512),(0,4096))>
DEBUG: thr_copy_ldmatrix_A:
Tiled Copy
Tiler MN: (32:1,32:1)
TV Layout tiled: ((4,8,2,2),((4,2,2),(1,1))):((128,1,16,0),((32,8,512),(0,0)))
Copy Atom
ThrID: 32:1
TV Layout Src: ((2,2,4,2),16):((16,128,32,0),1)
TV Layout Dst: ((4,8),(1,2,2,2)):((32,1),(1,16,8,128))
Value type: i8
--------------------------------
tCsA_copy_view = thr_copy_ldmatrix_A.partition_S(sA)
DEBUG: tCsA_copy_view:
tensor<ptr<i8, smem, align<8>> o (((8,2),2),2,2,(1,4)):(((1,128),1024),32,2048,(0,4096))>
--------------------------------