After #2581 we are getting closer to break the dependency on the RewriteTensorPointer pass (which transforms certain blocked pointers load/stores into load/stores that use tensor of pointers). This issue is about investigating the remaining performance gaps and addressing them.