How to create FP16 quantization scales?

All of the FP6 gemm functions take the FP6 weights **and** their FP16 scales for each output channel
```
 * [Input]
 *  fp6_tensor:  int  tensor of shape [OC, IC // 16 * 3];   // 3 INT32 words contains 16 FP6  weights.
 *  fp16_scale:  half tensor of shape [OC];                 // for row-wise quantization.
 ```
 
We have functions for converting FP16 weights to FP6 (`weight_prepacking_fp16_to_fp6`) and for packing the FP6 weights into the final inference format (`weight_matrix_prepacking`), but nothing to generate the scales to up-convert back to FP16. 

In the testing code for either python or c++ the scales are always randomly initialized. Is there a function that generates the scales needed for accurate dequantization with real weights?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to create FP16 quantization scales? #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to create FP16 quantization scales? #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions