gpt-2 : add comment about KV cache type (#1142)

bssrdf · web-flow · commit ef094525080b · 2025-03-13T20:29:19.000+02:00
* change KV cache to fp16 to take advantage of tensor cores

* added a note/comment to indicate kv can be FP16
diff --git a/examples/gpt-2/main-backend.cpp b/examples/gpt-2/main-backend.cpp
@@ -337,6 +337,8 @@ bool gpt2_model_load(const std::string & fname, gpt2_model & model, gpt_vocab &
         const int n_mem      = n_layer*n_ctx;
         const int n_elements = n_embd*n_mem;
 
+        // k and v here can also be GGML_TYPE_F16 to save memory and speed up the computation
+        // if backend supports it
         model.memory_k = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, n_elements);
         model.memory_v = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, n_elements);