Skip to content

Siwasaki/pr/loadstorescalar#16

Open
shintaro-iwasaki wants to merge 7 commits intotriton-mlirfrom
siwasaki/pr/loadstorescalar
Open

Siwasaki/pr/loadstorescalar#16
shintaro-iwasaki wants to merge 7 commits intotriton-mlirfrom
siwasaki/pr/loadstorescalar

Conversation

@shintaro-iwasaki
Copy link
Owner

No description provided.

@shintaro-iwasaki shintaro-iwasaki force-pushed the siwasaki/pr/loadstorescalar branch 6 times, most recently from 9770dc1 to a95a4f5 Compare October 17, 2022 23:24
Superjomn and others added 4 commits October 18, 2022 11:43
…ang#785)

Correct the Load/Store Op's vector size with the mask's alignment
correctly considered.

Some cases:

```mlir
// num_warp = 2
// block_size = 128
func @vecadd_mask_align_16(%a_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %b_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, 
  %out_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %n_elements: i32 {tt.divisibility = 16 : i32}) {
    // mask = make_range(128) < n_element
}
```
This should get the vec=2 `ld`/`st` instructions.

While the following example

```mlir
// num_warp = 2
// block_size = 128
func @vecadd_mask_align_16(%a_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %b_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, 
  %out_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %n_elements: i32) {
    // mask = make_range(128) < n_element
}
```
it should get the vec=1 `ld`/`st` instructions.
@shintaro-iwasaki shintaro-iwasaki force-pushed the siwasaki/pr/loadstorescalar branch from a95a4f5 to 07c440e Compare October 18, 2022 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants