feat: opt mulmat base on the official doc #12066

chraac · 2025-02-25T11:43:46Z

When cheking with upstream's contributing guidelines, found that a transpose op can be reduced, here's a pr to fix

Before

After

… Direct) backend

…neously

…ously and thread safe

…ing to review comments

…lained in https://github.com/zhouwg/llama.cpp/pull/1

* redo: add convert nodes This reverts commit 8448acd. * align clang format with cann * rename binary_op -> general_op casue there're some op that will only tak 1 param * Revert "rename binary_op -> general_op" This reverts commit 5be63b1. * wip * add GGML_OP_PERMUTE * add GGML_OP_VIEW and GGML_OP_GET_ROWS * wip * Revert "wip" This reverts commit 772462c.

* remove unused functions * wip * init from last devices * move init into constructor * wip * add static assert to device table * make kDeviceCaps as constexpr * get free memory and total memory * add optimize flag for qnn backend

* reduce log * wip * add function to create concat nodes * opt * insert concat node before mulmat * use resize op * wip * add bind_buffer and remov ggml prefix in tensor types * use gather node instead * fix tensor type, now succeed in gpu and cpu, failed in npu * add comment * wip * add comment * wip * in destructor, clear internal buffer before unbind * disable gather for npu * wip * count swap memory as free memory * wip * fix supported_types ggml_backend_device_i.supports_op will be invoked before ggml_backend_device_i.init_backend * rename create_tensors -> initialize_op_nodes * move ggml_qnn_op_config to deparated file * wip * add create_convert_nodes * add comment * enable different type in/out for npu and cpu backend * fix npu convert op * enlarge max buffer size * add more error code * check tensor type before create convert node * add log * add log * remove transpose0 and use buildin transpose flag * rename transpose1 -> transpose_out * disable convert for npu * add more logs

# Conflicts: # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-backend.cpp

# Conflicts: # ggml/src/ggml-backend-reg.cpp

* fix device binding at ggml_backend_qnn_buffer_type * merge ggml_backend_qnn_buffer_context and qnn_mem_buffer * wip * add log * wip * add qnn_buffer_ptr * remove tailing `\n` at log * add log * enable GGML_OP_NONE * wip * wip * disable tensor with view * wip * wip * more log for view tensor * re-enable view * wip * remove link android lib * set dimension at bind function * move graph traversal to backend-ops * wip * add get_view_internal_dimension to obtain the tensor view source dimension * use _view_source_dimensions to allocate qnn tensor * add place holder function ggml_backend_qnn_cpy_tensor_async * add ggml_qnn_aggregate_op_config * make matmul based on ggml_qnn_aggregate_op_config * wip * manually specify the order of op destruct * skip register qnn-cpu backend * disable view op again * remove _view_source_dimensions * add nop for reshape and view ops * add log * add comment

# Conflicts: # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt

# Conflicts: # ggml/src/ggml-backend-reg.cpp

* more log * split graph implementation into cpp file * rename: ggml_qnn_graph -> qnn_graph * add imput/output tensor to graph * fix assert * wip * add _ggml_tensor field in qnn tensor * add comments * add set_data_buffer with raw memory buffer * use set_data_buffer * op param buffer use qnn_buffer_ptr * add qnn_mem_buffer_slice * use qnn_buffer_ptr as tensor buffer * use new set_data_buffer to reduce copy * ggml_qnn_op_config: add function to set input/output tensor before init node * remove ggml_qnn_connectable_op_config and use ggml_qnn_single_op_config instead * wip * add initialize_op_nodes without tensor params * wip * add op caps table * merge kGgmlOpToQnnOp and kOpCaps tables * wip * add cache parameter to create_tensors * add init_from_ggml_graph * disable gelu for all backend * wip * move op index calc to op config module * use the ggml_tensor as parameter of build_graph * add log * use create_operation_from_op_tensor in old build_graph function * remove unused constructors * fix parameter count * remove unused member func/var * make init_from_ggml_graph as a class member: build_graph_from_ggml_graph * move graph finalize into member function `finalize()` * get graph key from ggml op tensor directly * append output type * reduce tensor key length * add function to generate key from ggml_cgraph * simplify graph cache insert and delete * remove template param at get_qnn_graph_from_cache * wip * merge kQnnUnaryOpsTable and kQnnBinaryOpsTable * refactor device_supports_op * add log * wip * use framework function to check same shape * wip * extract some logic into separated function * wip * add execution function that runs graph * add function to create qnn graph from ggml_cgraph with cache * execute graph directly * return null graph key for empty graph * add more qualcomm chipset enums * add cap for reshape * disable some ops * try to skip GGML_OP_VIEW * moew log for view tensor * append param tensor into intermedia tensor key * use 'ordered' set * fix warning in release * wip

# Conflicts: # ggml/CMakeLists.txt # src/llama.cpp

* disable rpc buffer for npu * append input/output tensor size into unsupported op log * log dimensions for unsupported tensor * wip * split op config classes into separated file * fix reshape * wip * add op_constructor_with_type_param * set parameter for op_constructor_with_type_param func

* move qnn_instance function implementation into cpp * wip * wip * move dl related function into separated file * use cast op for gpu * Revert "use cast op for gpu" This reverts commit 05df736. * Reapply "use cast op for gpu" This reverts commit 2520e59. * fix compiling error in win * fix align_alloc in win * fix compiling error * add get sys free/total mem for win * wip * suppress warning in win * add missing chrono header * set the correct qnn lib name for windows * add flag to control cpu backend * wip * wip * Revert "Reapply "use cast op for gpu"" This reverts commit f56519c. * fix compiling error for linux build * fix cdsprpc dynamic library name * wip * skip rpc load fail * fix page_align_alloc * suppress some warning in gcc * wip * reuse align to function * more log * add log and fix warning * wip * fix asan errors and memory leaks * fix the get_io_tensors_from_graph * improve comment * print GGML_QNN_DEFAULT_LIB_SEARCH_PATH * revert some unused changes * move library search path setter into qnn module * fix android library loading * skip qnn_device_get_platform_info for npu emulator

https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md

zhou.weiguo and others added 30 commits April 24, 2024 16:28

ggml: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine…

b0c3013

… Direct) backend

ggml: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine…

d325088

… Direct) backend

rebase

c75817b

refine ggml-qnn-ut program and script to make reviewers happy

9c872cb

review: replace external declaration with NDK header file

926a866

add supportive of quantize data type Q8_0

dd29834

review: remove unused QNN helper functions

f4c5303

ggml-qnn: remove static global vars to support multi-instance simulta…

2fab33d

…neously

review: remove static global vars to support multi-instance simultane…

94ee775

…ously and thread safe

review: put qnn's internal log inside preprocessor diretive

5d691c6

review: code format using clang-format + manually modification accord…

fdf0272

…ing to review comments

review: fix a memory leak introduced by review modification which exp…

3e8b61f

…lained in https://github.com/zhouwg/llama.cpp/pull/1

npu: probe htp info and capacity of rpc ion memory

d38d4a6

ggml-qnn: refine source code of ggml-qnn.cpp to make reviewer more happy

5f8cfe4

ggml-qnn: refine ggml inference using QNN NPU

5269e08

ggml-qnn: refine ggml inference using QNN NPU

faaa86b

review: make a MVP(Minimum Viable PR) style PR in upstream

5598fbd

init the test array with const values

5e18cdc

add ggml_qnn_tensor_binder

6c68adc

use tensor wrapper in add

37bb926

use tensor wrapper in matmul

36e41a1

use ggml_qnn_tensor_reader for output tensor

a5679dd

use ggml_qnn_tensor_writer for all parameters

5fe7b87

rename

9456bba

fix todo

65a14d9

make the constant condition first

aeef0c6

remove TODO

dfe159f

split logger function, tensors and backend from main qnn source

9932062

remove reference of g_qnn_mgr in qnn_instance

3c491a3

fix compiling error

3fe07eb

chraac added 26 commits November 4, 2024 23:12

feat: fix llama-bench (#7)

e6dbdac

* remove unused functions * wip * init from last devices * move init into constructor * wip * add static assert to device table * make kDeviceCaps as constexpr * get free memory and total memory * add optimize flag for qnn backend

Merge branch 'master' into dev-refactoring

9f62fc9

bugfix: block large tensor calc in npu

5103b16

Merge branch 'master' into dev-refactoring

67b183c

# Conflicts: # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-backend.cpp

redo conflict changes

6d4feae

define compile flag as module private

09efaa3

fix: fix assertion

c5e6549

Merge branch 'master' into dev-refactoring

cf91253

# Conflicts: # ggml/src/ggml-backend-reg.cpp

fix int overflow and remove view op to pass unit test

0d02ee0

Merge branch 'master' into dev-refactoring

6d3267a

# Conflicts: # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt

add missing op

79f124a

Merge branch 'master' into dev-refactoring

8f07b3e

# Conflicts: # ggml/src/ggml-backend-reg.cpp

Merge branch 'master' into dev-refactoring

c410717

# Conflicts: # ggml/CMakeLists.txt # src/llama.cpp

fix compiling error after merged

5f93376

Merge branch 'master' into dev-refactoring

3ed9f5b

Merge branch 'master' into dev-refactoring

34d9b38

Merge branch 'master' into dev-refactoring

ba324b0

Merge branch 'master' into dev-refactoring

12c75f1

Merge branch 'master' into dev-refactoring

84328ff

opt mulmat base on official doc

3cb35ff

https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md

chraac closed this Feb 25, 2025

chraac deleted the dev-opt-mulmat branch February 25, 2025 11:46

github-actions bot added build Compilation issues ggml changes relating to the ggml tensor library for machine learning labels Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: opt mulmat base on the official doc #12066

feat: opt mulmat base on the official doc #12066

Uh oh!

chraac commented Feb 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: opt mulmat base on the official doc #12066

feat: opt mulmat base on the official doc #12066

Uh oh!

Conversation

chraac commented Feb 25, 2025

Before

After

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants