Skip to content

Improve performance of tt.load and tt.store for FP8 when converting block ptr to regular ptrs #2374

@etiotto

Description

@etiotto

We would like to remove the RewriteTensorPointer pass which rewrites block pointers into regular pointers (except when it determines load/store operations on block ptrs can be converted to 2D block reads/writes). The idea is to avoid loosing semantic information too early and instead deal with block ptr that cannot be used to generate 2D block reads/stores while lowering that operation).

For this scheme to work, we first need to improve the lowering code for tt.load and tt.store operations that use a block ptr with an element type that is not (currently) supported by the 2D read instructions available on the target GPU (e.g. the element is FP8).

See #2359 (comment) for more context.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions