Skip to content

Commit 2e63ab7

Browse files
committed
Update base for Update on "[ET][Memory planning] Improve greedy memory planning."
This diff replaces the old greedy algorithm. Older algorithm resulted in 35% worse compared to theoretical optimum. THis matter for long context even more since additional overhead can be few hundred MB. For example the theorical optimial for llama3_2 8B, 4-bit quantized modelw ith context length of 2k needs about 1G of memory. This theoretcial max can be observed by looking at the peaks in memory profile. Current agorithm resulted in about 1.6GB of planned memory. New algorithm reduce that to about 1.1G. Differential Revision: [D68448332](https://our.internmc.facebook.com/intern/diff/D68448332/) cc JacobSzwejbka angelayi [ghstack-poisoned]
2 parents e8ebe1a + d4a8f8f commit 2e63ab7

File tree

9 files changed

+331
-517
lines changed

9 files changed

+331
-517
lines changed

backends/cadence/hifi/kernels/targets.bzl

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,17 @@ load("@fbsource//tools/build_defs:platform_defs.bzl", "CXX")
22
load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime")
33

44
def define_common_targets():
5+
common_deps = [
6+
"//executorch/runtime/kernel:kernel_includes",
7+
]
8+
59
runtime.cxx_library(
610
name = "kernels",
711
srcs = ["kernels.cpp"],
812
exported_headers = [
913
"kernels.h",
1014
],
15+
deps = common_deps,
1116
visibility = [
1217
"//executorch/backends/cadence/...",
1318
],

backends/cadence/hifi/operators/op_where.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ namespace impl {
2828
namespace HiFi {
2929
namespace native {
3030

31-
Tensor& where_out(
31+
Tensor& where_self_out(
3232
RuntimeContext& ctx,
3333
const Tensor& cond,
3434
const Tensor& a,

backends/cadence/hifi/third-party/nnlib/xa_nn_elm_clamp_f32_broadcast.c

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,11 @@
1919
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2020
2121
******************************************************************************/
22-
#include "nnlib-hifi4/xa_nnlib/include/xa_type_def.h"
23-
#include "nnlib-hifi4/xa_nnlib/algo/common/include/xa_nnlib_common_fpu.h"
24-
#include "nnlib-hifi4/xa_nnlib/algo/common/include/xa_nn_common.h"
25-
#include "nnlib-hifi4/xa_nnlib/algo/common/include/xa_nnlib_err_chk.h"
26-
#include "nnlib-hifi4/xa_nnlib/algo/kernels/basic/hifi4/xa_nn_basic_state.h"
27-
#include "nnlib-hifi4/xa_nnlib/include/nnlib/xa_nnlib_kernels_api.h"
22+
#include "xa_type_def.h"
23+
#include "xa_nnlib_common_fpu.h"
24+
#include "xa_nn_common.h"
25+
#include "xa_nnlib_err_chk.h"
26+
#include "xa_nnlib_kernels_api.h"
2827

2928

3029
#if !HAVE_VFPU

0 commit comments

Comments
 (0)