Skip to content

Matrix Transpose Tutorial Cleanup #1916

@MrBurmark

Description

@MrBurmark

I found a couple things while looking at the transpose tutorial.

First, the launch and kernel solutions could use block_unchecked policies. This will also allow the kernel implementation to skip the second sync threads call.

Second, it doesn't look like the launch solution actually uses shared memory as intended. It looks like the same thread that reads a value writes that value. The intention of shared memory is to let different threads read and write so memory accesses to both matrices are coalesced. This will require the launch solution to have a teamSync call, which it is currently lacking.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationreviewedMark with this label when issue has been discussed by team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions