amx-reload-bug.txt
I build LLC from the main branch and compile the attached test case with -mcpu=sapphirerapids. The resulting assembly code has the following instructions:
tilestored %tmm0, 6016(%rsp,%rax) # 1024-byte Folded Spill
tileloadd 6016(%rsp), %tmm7 # 1024-byte Folded Reload
Here tile reload misses index register with a stride, therefore zero stride is used and the first row of the stored tile is broadcasted. Also, it looks like this spill/reload pair can be avoided.