Skip to content

Commit c386114

Browse files
authored
arch : add description about LLM_TENSOR_INFOS (#17550)
1 parent 6783b11 commit c386114

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

src/llama-arch.cpp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2487,6 +2487,16 @@ static const std::map<llm_arch, std::map<llm_tensor, const char *>> LLM_TENSOR_N
24872487
},
24882488
};
24892489

2490+
// declare information about the model weight tensors:
2491+
// - the layer in which the tensor is going to be used. this is needed in order to assign the correct buffer type for the weight
2492+
// - the operator which is going to use the weight. this is needed to determine if the respective backend supports the operator
2493+
//
2494+
// for example, input layers are usually assigned to CPU/host buffer types
2495+
//
2496+
// a mismatch between the declared information and the actual layer/op in which the tensor is used can lead to sub-optimal
2497+
// assignment of the buffer types and extra overhead during computation
2498+
// example: https://github.com/ggml-org/llama.cpp/pull/17548
2499+
//
24902500
static const std::map<llm_tensor, llm_tensor_info> LLM_TENSOR_INFOS = {
24912501
{LLM_TENSOR_TOKEN_EMBD, {LLM_TENSOR_LAYER_INPUT, GGML_OP_GET_ROWS}},
24922502
{LLM_TENSOR_POS_EMBD, {LLM_TENSOR_LAYER_INPUT, GGML_OP_GET_ROWS}},

0 commit comments

Comments
 (0)