You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Summary
This is PR 1 of 3 implementing a dim order aware clone op.
Currently, clone ops are removed during export as no-ops, causing memory
layout (dim order) changes to be lost. This can cause backend failures,
incorrect outputs when ops expect specific layouts, and performance
degradation. This set of PRs introduces a dim order aware clone op,
`_clone_dim_order`, which preserves memory layout changes by explicitly
storing dim order information. This is implemented by replacing standard
clone ops with this variant during export and updating the clone removal
transform to preserve clones that change layout.
This PR adds the portable CPU kernel for the `_clone_dim_order` op,
implementing a clone variant that preserves dim order at runtime. The
portable kernel validates dtype and layout compatibility, resizes the
output tensor if needed, and performs an element wise clone of the
tensors.
Note: A future PR will add the ATen kernel for `_clone_dim_order`.
Related PRs:
- PR 2: [#12971](#12971) -
Register `_clone_dim_order` op and map `aten.clone`
- PR 3: [#12976](#12976) -
Update RemoveCloneOpsTransform to be dim_order aware
Fixes#12645
### Test plan
Added kernel runtime tests to verify:
- Tensors of all real dtypes are cloned correctly.
- Failure when input and output tensor shapes mismatch.
- Failure with unsupported memory formats.
- Failure when `non_blocking=true` since the portable kernel only
supports blocking data transfer.
- Dynamic shape outputs are cloned with correct values.
- Layout conversions are cloned correctly for `contiguous` to
`channels_last`, `channels_last` to `contiguous`, and `channels_last` is
preserved.
All runtime tests pass via:
`build-ninja/kernels/test/portable_kernels_test`
---------
Co-authored-by: Gasoonjia <[email protected]>
0 commit comments