-
Couldn't load subscription status.
- Fork 701
[ET-VK] New implementation of cat operator
#11508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Changes
* Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
* Introduce `Concat.cpp` to replace `Cat.cpp`
* Fix a bug with channels-packed buffer tensors where input data would be copied incorrectly with multiple dims have a stride of 1
## Motivation
> * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
> * Introduce `Concat.cpp` to replace `Cat.cpp`
The existing implementation of `torch.cat` uses the copy_channel_offset` shaders. However, these shaders have a critical bug where the output tensor is passed in separately with difference access types, i.e.
```
graph.execute_nodes().emplace_back(new DispatchNode(
graph,
VK_KERNEL_FROM_STR(kernel_name),
global_size,
local_size,
// Inputs and Outputs
{
{out, vkapi::kWrite},
{out, vkapi::kRead},
{in, vkapi::kRead},
},
```
This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly. The result is that the `cat` operator produces incorrect result on many platforms.
Rather than fix the `copy_offset` shaders, I decided to just introduce new shaders to perform the concat operation. The new implementation handles both buffer and texture inputs and is agnostic to memory layout.
Differential Revision: [D76305343](https://our.internmc.facebook.com/intern/diff/D76305343/)
[ghstack-poisoned]
## Changes
* Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
* Introduce `Concat.cpp` to replace `Cat.cpp`
* Fix a bug with channels-packed buffer tensors where input data would be copied incorrectly with multiple dims have a stride of 1
## Motivation
> * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
> * Introduce `Concat.cpp` to replace `Cat.cpp`
The existing implementation of `torch.cat` uses the copy_channel_offset` shaders. However, these shaders have a critical bug where the output tensor is passed in separately with difference access types, i.e.
```
graph.execute_nodes().emplace_back(new DispatchNode(
graph,
VK_KERNEL_FROM_STR(kernel_name),
global_size,
local_size,
// Inputs and Outputs
{
{out, vkapi::kWrite},
{out, vkapi::kRead},
{in, vkapi::kRead},
},
```
This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly. The result is that the `cat` operator produces incorrect result on many platforms.
Rather than fix the `copy_offset` shaders, I decided to just introduce new shaders to perform the concat operation. The new implementation handles both buffer and texture inputs and is agnostic to memory layout.
Differential Revision: [D76305343](https://our.internmc.facebook.com/intern/diff/D76305343/)
ghstack-source-id: 289266299
Pull Request resolved: #11508
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11508
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 2 Cancelled Jobs, 4 Unrelated FailuresAs of commit a30836e with merge base 4a14fdd ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D76305343 |
This PR needs a
|
## Changes
* Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
* Introduce `Concat.cpp` to replace `Cat.cpp`
* Fix a bug with channels-packed buffer tensors where input data would be copied incorrectly with multiple dims have a stride of 1
## Motivation
> * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
> * Introduce `Concat.cpp` to replace `Cat.cpp`
The existing implementation of `torch.cat` uses the copy_channel_offset` shaders. However, these shaders have a critical bug where the output tensor is passed in separately with difference access types, i.e.
```
graph.execute_nodes().emplace_back(new DispatchNode(
graph,
VK_KERNEL_FROM_STR(kernel_name),
global_size,
local_size,
// Inputs and Outputs
{
{out, vkapi::kWrite},
{out, vkapi::kRead},
{in, vkapi::kRead},
},
```
This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly. The result is that the `cat` operator produces incorrect result on many platforms.
Rather than fix the `copy_offset` shaders, I decided to just introduce new shaders to perform the concat operation. The new implementation handles both buffer and texture inputs and is agnostic to memory layout.
Differential Revision: [D76305343](https://our.internmc.facebook.com/intern/diff/D76305343/)
[ghstack-poisoned]
Pull Request resolved: #11508 ## Changes * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator * Introduce `Concat.cpp` to replace `Cat.cpp` * Fix a bug with channels-packed buffer tensors where input data would be copied incorrectly with multiple dims have a stride of 1 ## Motivation > * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator > * Introduce `Concat.cpp` to replace `Cat.cpp` The existing implementation of `torch.cat` uses the copy_channel_offset` shaders. However, these shaders have a critical bug where the output tensor is passed in separately with difference access types, i.e. ``` graph.execute_nodes().emplace_back(new DispatchNode( graph, VK_KERNEL_FROM_STR(kernel_name), global_size, local_size, // Inputs and Outputs { {out, vkapi::kWrite}, {out, vkapi::kRead}, {in, vkapi::kRead}, }, ``` This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly. The result is that the `cat` operator produces incorrect result on many platforms. Rather than fix the `copy_offset` shaders, I decided to just introduce new shaders to perform the concat operation. The new implementation handles both buffer and texture inputs and is agnostic to memory layout. Differential Revision: [D76305343](https://our.internmc.facebook.com/intern/diff/D76305343/) ghstack-source-id: 289266299
|
This pull request was exported from Phabricator. Differential Revision: D76305343 |
## Changes
* Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
* Introduce `Concat.cpp` to replace `Cat.cpp`
* Fix a bug with channels-packed buffer tensors where input data would be copied incorrectly with multiple dims have a stride of 1
## Motivation
> * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
> * Introduce `Concat.cpp` to replace `Cat.cpp`
The existing implementation of `torch.cat` uses the copy_channel_offset` shaders. However, these shaders have a critical bug where the output tensor is passed in separately with difference access types, i.e.
```
graph.execute_nodes().emplace_back(new DispatchNode(
graph,
VK_KERNEL_FROM_STR(kernel_name),
global_size,
local_size,
// Inputs and Outputs
{
{out, vkapi::kWrite},
{out, vkapi::kRead},
{in, vkapi::kRead},
},
```
This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly. The result is that the `cat` operator produces incorrect result on many platforms.
Rather than fix the `copy_offset` shaders, I decided to just introduce new shaders to perform the concat operation. The new implementation handles both buffer and texture inputs and is agnostic to memory layout.
Differential Revision: [D76305343](https://our.internmc.facebook.com/intern/diff/D76305343/)
[ghstack-poisoned]
Pull Request resolved: #11508 ## Changes * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator * Introduce `Concat.cpp` to replace `Cat.cpp` * Fix a bug with channels-packed buffer tensors where input data would be copied incorrectly with multiple dims have a stride of 1 ## Motivation > * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator > * Introduce `Concat.cpp` to replace `Cat.cpp` The existing implementation of `torch.cat` uses the copy_channel_offset` shaders. However, these shaders have a critical bug where the output tensor is passed in separately with difference access types, i.e. ``` graph.execute_nodes().emplace_back(new DispatchNode( graph, VK_KERNEL_FROM_STR(kernel_name), global_size, local_size, // Inputs and Outputs { {out, vkapi::kWrite}, {out, vkapi::kRead}, {in, vkapi::kRead}, }, ``` This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly. The result is that the `cat` operator produces incorrect result on many platforms. Rather than fix the `copy_offset` shaders, I decided to just introduce new shaders to perform the concat operation. The new implementation handles both buffer and texture inputs and is agnostic to memory layout. ghstack-source-id: 289956459 Differential Revision: [D76305343](https://our.internmc.facebook.com/intern/diff/D76305343/)
|
This pull request was exported from Phabricator. Differential Revision: D76305343 |
## Changes
* Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
* Introduce `Concat.cpp` to replace `Cat.cpp`
* Fix a bug with channels-packed buffer tensors where input data would be copied incorrectly with multiple dims have a stride of 1
## Motivation
> * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
> * Introduce `Concat.cpp` to replace `Cat.cpp`
The existing implementation of `torch.cat` uses the copy_channel_offset` shaders. However, these shaders have a critical bug where the output tensor is passed in separately with difference access types, i.e.
```
graph.execute_nodes().emplace_back(new DispatchNode(
graph,
VK_KERNEL_FROM_STR(kernel_name),
global_size,
local_size,
// Inputs and Outputs
{
{out, vkapi::kWrite},
{out, vkapi::kRead},
{in, vkapi::kRead},
},
```
This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly. The result is that the `cat` operator produces incorrect result on many platforms.
Rather than fix the `copy_offset` shaders, I decided to just introduce new shaders to perform the concat operation. The new implementation handles both buffer and texture inputs and is agnostic to memory layout.
Differential Revision: [D76305343](https://our.internmc.facebook.com/intern/diff/D76305343/)
[ghstack-poisoned]
Pull Request resolved: #11508 ## Changes * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator * Introduce `Concat.cpp` to replace `Cat.cpp` * Fix a bug with channels-packed buffer tensors where input data would be copied incorrectly with multiple dims have a stride of 1 ## Motivation > * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator > * Introduce `Concat.cpp` to replace `Cat.cpp` The existing implementation of `torch.cat` uses the copy_channel_offset` shaders. However, these shaders have a critical bug where the output tensor is passed in separately with difference access types, i.e. ``` graph.execute_nodes().emplace_back(new DispatchNode( graph, VK_KERNEL_FROM_STR(kernel_name), global_size, local_size, // Inputs and Outputs { {out, vkapi::kWrite}, {out, vkapi::kRead}, {in, vkapi::kRead}, }, ``` This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly. The result is that the `cat` operator produces incorrect result on many platforms. Rather than fix the `copy_offset` shaders, I decided to just introduce new shaders to perform the concat operation. The new implementation handles both buffer and texture inputs and is agnostic to memory layout. ghstack-source-id: 290022825 Differential Revision: [D76305343](https://our.internmc.facebook.com/intern/diff/D76305343/)
|
This pull request was exported from Phabricator. Differential Revision: D76305343 |
3f2addc
into
gh/SS-JIA/239/base
## Context Previously, I updated the implementation of `aten.cat.default` in D76305343 (#11508) since the original implementation had a bug. The new implementation only supported up to 3 input tensors, but several models require the need for up to 6 input tensors. This diff updates the capabilities of the `concat` op so that any arbitrary number of input tensors may be accepted. ## Changes * Update implementation of the concat shader to be able to be called repeatedly, allowing support for any number of input tensors. Differential Revision: [D79893084](https://our.internmc.facebook.com/intern/diff/D79893084/) [ghstack-poisoned]
… any number of input tensors" ## Context Previously, I updated the implementation of `aten.cat.default` in D76305343 (#11508) since the original implementation had a bug. The new implementation only supported up to 3 input tensors, but several models require the need for up to 6 input tensors. This diff updates the capabilities of the `concat` op so that any arbitrary number of input tensors may be accepted. ## Changes * Update implementation of the concat shader to be able to be called repeatedly, allowing support for any number of input tensors. Differential Revision: [D79893084](https://our.internmc.facebook.com/intern/diff/D79893084/) [ghstack-poisoned]
…nput tensors" ## Context Previously, I updated the implementation of `aten.cat.default` in D76305343 (#11508) since the original implementation had a bug. The new implementation only supported up to 3 input tensors, but several models require the need for up to 6 input tensors. This diff updates the capabilities of the `concat` op so that any arbitrary number of input tensors may be accepted. ## Changes * Update implementation of the concat shader to be able to be called repeatedly, allowing support for any number of input tensors. Differential Revision: [D79893084](https://our.internmc.facebook.com/intern/diff/D79893084/) [ghstack-poisoned]
#13252) ## Context Previously, I updated the implementation of `aten.cat.default` in D76305343 (#11508) since the original implementation had a bug. The new implementation only supported up to 3 input tensors, but several models require the need for up to 6 input tensors. This diff updates the capabilities of the `concat` op so that any arbitrary number of input tensors may be accepted. ## Changes * Update implementation of the concat shader to be able to be called repeatedly, allowing support for any number of input tensors. Differential Revision: [D79893084](https://our.internmc.facebook.com/intern/diff/D79893084/)
pytorch#13252) ## Context Previously, I updated the implementation of `aten.cat.default` in D76305343 (pytorch#11508) since the original implementation had a bug. The new implementation only supported up to 3 input tensors, but several models require the need for up to 6 input tensors. This diff updates the capabilities of the `concat` op so that any arbitrary number of input tensors may be accepted. ## Changes * Update implementation of the concat shader to be able to be called repeatedly, allowing support for any number of input tensors. Differential Revision: [D79893084](https://our.internmc.facebook.com/intern/diff/D79893084/)
Stack from ghstack (oldest at bottom):
catoperator #11508vTensormember variables and exposedim orderUBO and push constant #11599Changes
concat_texture.glslandconcat_buffer.glslto implement thetorch.catoperatorConcat.cppto replaceCat.cppMotivation
The existing implementation of
torch.catuses the copy_channel_offset` shaders. However, these shaders have a critical bug where the output tensor is passed in separately with difference access types, i.e.This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly. The result is that the
catoperator produces incorrect result on many platforms.Rather than fix the
copy_offsetshaders, I decided to just introduce new shaders to perform the concat operation. The new implementation handles both buffer and texture inputs and is agnostic to memory layout.Differential Revision: D76305343