+Iterate through the 256-element segments of a row from src0: a. Let the current src0 segment be src0_seg with quantization type type0 (e.g., GGML_TYPE_Q4_0). b. Determine the vec_dot_type required for src1 by looking at the type_traits_cpu[type0].vec_dot function. For example, if type0 is GGML_TYPE_Q4_0, its vec_dot is ggml_vec_dot_q4_0_q8_0, meaning src1 needs to be effectively GGML_TYPE_Q8_0 for this specific dot product call. c. If src1 is F32: Quantize the corresponding 256-element segment of src1 (let's call it src1_seg_f32) into a temporary buffer src1_seg_quantized of the required vec_dot_type (e.g., GGML_TYPE_Q8_0). This quantization happens on-the-fly for each segment of src1. d. If src1 is already quantized (e.g., entirely Q8_K): Use the corresponding segment src1_seg_quantized directly, assuming its type is compatible with the vec_dot function chosen for src0_seg (e.g. if src0_seg is Q4_K and src1 is Q8_K, this works). e. Call the appropriate ggml_vec_dot_[type0]_[type1_eff] function (e.g., ggml_vec_dot_q4_0_q8_0(src0_seg, src1_seg_quantized_to_q8_0, ...)). f. Accumulate the F32 result from this segment's dot product.
0 commit comments