You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update on "[ET-VK] New implementation of cat operator"
## Changes
* Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
* Introduce `Concat.cpp` to replace `Cat.cpp`
* Fix a bug with channels-packed buffer tensors where input data would be copied incorrectly with multiple dims have a stride of 1
## Motivation
> * Introduce `concat_texture.glsl` and `concat_buffer.glsl` to implement the `torch.cat` operator
> * Introduce `Concat.cpp` to replace `Cat.cpp`
The existing implementation of `torch.cat` uses the copy_channel_offset` shaders. However, these shaders have a critical bug where the output tensor is passed in separately with difference access types, i.e.
```
graph.execute_nodes().emplace_back(new DispatchNode(
graph,
VK_KERNEL_FROM_STR(kernel_name),
global_size,
local_size,
// Inputs and Outputs
{
{out, vkapi::kWrite},
{out, vkapi::kRead},
{in, vkapi::kRead},
},
```
This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly. The result is that the `cat` operator produces incorrect result on many platforms.
Rather than fix the `copy_offset` shaders, I decided to just introduce new shaders to perform the concat operation. The new implementation handles both buffer and texture inputs and is agnostic to memory layout.
Differential Revision: [D76305343](https://our.internmc.facebook.com/intern/diff/D76305343/)
[ghstack-poisoned]
0 commit comments