-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
All of the FP6 gemm functions take the FP6 weights and their FP16 scales for each output channel
* [Input]
* fp6_tensor: int tensor of shape [OC, IC // 16 * 3]; // 3 INT32 words contains 16 FP6 weights.
* fp16_scale: half tensor of shape [OC]; // for row-wise quantization.
We have functions for converting FP16 weights to FP6 (weight_prepacking_fp16_to_fp6) and for packing the FP6 weights into the final inference format (weight_matrix_prepacking), but nothing to generate the scales to up-convert back to FP16.
In the testing code for either python or c++ the scales are always randomly initialized. Is there a function that generates the scales needed for accurate dequantization with real weights?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels