You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement experimental intermediate cross CPU EP allocation (microsoft#24371)
### Description
<!-- Describe your changes. -->
Onnxruntime manages a number of CPU based accelerators. I.e. those that
can operate on CPU based inputs.
However, several of them like `Qnn`, `Openvino` and `Vitis` may require
CPU based inputs to be either aligned to 4K so they can be memory mapped or
prefer to override the device with their own CPU accessible allocator.
To mitigate that, we introduce a new CPU based allocator that produces
4K aligned memory.
We also adjust allocation planner to override plain CPU device. When we
detect a compiled CPU based EP, we adjust the device according by
requesting the EP to return `OrtMemType::OrtMemTypeCPUInput`. This gives
the EP an opportunity to return either GPU/NPU device or CPU device
depending on the mode it is operating.
We select the device with larger alignment betrween CPU default devices.
We also adjust memory patterns to make sure 4K alignment is respected in
the contagious buffers when appropriate.
### Motivation and Context
CPU Based providers, notably accept CPU based inputs, but they have a
requirement of 4K allocations, otherwise the input incurs an extra copy.
This is especially noticeable with intermediate values that are produced
by upstream CPU based nodes.
Qnn has its own allocator when it is enabled, we make sure it is correctly advertised to the allocation
planner. This PR excludes Qnn allocator usage for intermediate values
due to the overhead contributed by memhandle management.
Cc: @quic-ashigarg
---------
Co-authored-by: edgchen1 <[email protected]>
0 commit comments