[QST] Why the pipeline wait before update tma descriptors are removed, at tma array gemm kernel

**What is your question?**

Before cutlass 4.0.0, there is a pipeline wait before update the tma descriptor (https://github.com/NVIDIA/cutlass/blob/2b78c2fe31d4adb4770f3ca226a7b4acc4e85e2b/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_cooperative.hpp#L709-L713).

As commented:
```
// Purpose of this pipeline state is to make sure TMA loads have finished before doing descriptor updates
// Since this state is waiting for loads to finish, it must start in the inverted phase.
```

why it can be safely removed?

	// Purpose of this pipeline state is to make sure TMA loads have finished before doing descriptor updates
	// Since this state is waiting for loads to finish, it must start in the inverted phase.
	typename CollectiveMainloop::PipelineState mainloop_pipe_tma_consumer_state =
	{mainloop_pipe_producer_state.index(), !mainloop_pipe_producer_state.phase(), mainloop_pipe_producer_state.count()};
	mainloop_pipeline.consumer_wait(mainloop_pipe_tma_consumer_state);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Why the pipeline wait before update tma descriptors are removed, at tma array gemm kernel #2912

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] Why the pipeline wait before update tma descriptors are removed, at tma array gemm kernel #2912

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions