You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This pull request introduces several improvements to the CUDA backend.
The main changes include adding a new graph pass to replace unnecessary
`slice_copy` operations, improving how method names are tracked in
compilation artifacts, and making the preprocessing pipeline more robust
and accurate.
**Key changes:**
### Graph optimization and preprocessing
* Introduced `ReplaceSliceCopyWithSlicePass`, a new export pass that
replaces non-mutated `slice_copy` operations with more efficient `slice`
view operations in the computational graph
(`replace_slice_copy_with_slice.py`, used in `cuda_backend.py`).
[[1]](diffhunk://#diff-c4a228b182f50f778545991d472609ad705d2325994342174093ff374738851dR1-R113)
[[2]](diffhunk://#diff-5b5ea2257772b3aba04b2534f5ea1429a0c631bfd25a7ef531f526e76c471d7aR115-R117)
* Added context management for attention kernel selection and no-grad
mode during AOT compilation to ensure correct backend selection for
decomposition. This is needed in the short term until we have a flash
attention cuda kernel.
### Method name and compile specification handling
* Added a `COMPILE_SPEC_KEYS` enum and utility methods
(`generate_method_name_compile_spec`, `method_name_from_compile_specs`)
to consistently embed and retrieve the method name in compile specs and
as a key in the data store, improving traceability of compiled
artifacts.
[[1]](diffhunk://#diff-5b5ea2257772b3aba04b2534f5ea1429a0c631bfd25a7ef531f526e76c471d7aL24-R35)
[[2]](diffhunk://#diff-5b5ea2257772b3aba04b2534f5ea1429a0c631bfd25a7ef531f526e76c471d7aL161-R158)
[[3]](diffhunk://#diff-5b5ea2257772b3aba04b2534f5ea1429a0c631bfd25a7ef531f526e76c471d7aR169-R195)
### Code cleanup and maintainability
* Minor refactor in `cuda_partitioner.py` to clarify delegation tag
assignment.
* Improved imports and code organization for clarity in
`cuda_backend.py`.
These changes collectively improve the reliability, performance, and
maintainability of the CUDA backend pipeline.
0 commit comments