Skip to content

Commit ef09452

Browse files
authored
gpt-2 : add comment about KV cache type (#1142)
* change KV cache to fp16 to take advantage of tensor cores * added a note/comment to indicate kv can be FP16
1 parent 2efc170 commit ef09452

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

examples/gpt-2/main-backend.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -337,6 +337,8 @@ bool gpt2_model_load(const std::string & fname, gpt2_model & model, gpt_vocab &
337337
const int n_mem = n_layer*n_ctx;
338338
const int n_elements = n_embd*n_mem;
339339

340+
// k and v here can also be GGML_TYPE_F16 to save memory and speed up the computation
341+
// if backend supports it
340342
model.memory_k = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, n_elements);
341343
model.memory_v = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, n_elements);
342344

0 commit comments

Comments
 (0)