Test script for evaluation of matmul error in different pre/post-processing and quantization conditions

Develop python script in [repo](https://github.com/Deelvin/tvm-samples/tree/main/python_scripts/experimental) which should do and test the following (see also below in the thread):
1. Base scenario:
- Create two matrices (X, W) filled by floating point values with predefined distribution.
 - square matrices with size 1024*1024
 - value type is float16
 - develop flexible solution with possibility to change size and data type 
- Multiply the matrices (X \* W = Y) and save result (Y, original one)
- Preprocess matrices (X -> X', W -> W')
 - develop flexible solution with possibility to change preprocessing type
- Multiply the preprocessed matrices (X' \* W' = Y') and save result (Y', preprocessed one)
- Fake-quantize the preprocessed matrices (Q(W') or/and Q(X'))
 - at least one matrix should be quantize, but it can be only one
 - develop flexible solution with possibility to change quantization type
- Multiply the quantized matrices and save result (Yq, quantized one)
- Postprocess the quantized result (Yq -> Yp)
- Find differences between pre/postprocessed, quantized and original results. Use metrics for the obtained matrices as final evaluation value
2. Data distribution
- in general it assumes combination of two normal distributions (context and outliers) for both X and W matrices
- The following parameters are used to control distribution:
 - context average value
 - context dispersion
 - number of context values
 - distance between context and outliers or outliers average value
 - outliers dispersion
 - number of outliers values
- Start simplification: W has context only. outliers dispersion is much less than context one (e.g. Do = 0.1 \* Dc). Outliers number is much less matrix size (e.g. No ~ 0.1 \* sqrt(Nc)). Context average value = 0.
3. Preprocessing
- Smoothing from SmoothQuant
- AWQ algorithm
4. Quantization
- symmetric per-tensor int8
- symmetric per-channel int8
- symmetric per-group int8
 - 32
 - 64
 - 128
- asymmetric per-tensor int8
- asymmetric per-channel int8
- asymmetric per-group int8
 - 32
 - 64
 - 128
- GPTQ-like
 - int8
 - int4
 - int3 
5. Postprocessing
- First step: no postprocessing
- compensate error by bias
6. Metrics
- First step: Use Frobenius norm (LF) for error evaluation (see [here](https://en.wikipedia.org/wiki/Matrix_norm) or [Russian version](https://ru.wikipedia.org/wiki/%D0%9D%D0%BE%D1%80%D0%BC%D0%B0_%D0%BC%D0%B0%D1%82%D1%80%D0%B8%D1%86%D1%8B))
- Study different matrix norm and analyze do they give us correct metrics.
7. Statistics scenario
- Use base scenario for set of matrices (e.g. 100) with the same distributions and collect error statistics (mean and std)
8. Calibration scenario
- Use set of matrices (e.g. 100) with the same distributions for parameters calibration and evaluate error statistics on other set of matrices (e.g. 100) with the same distributions
9. Additional features:
- use matrix from dump

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test script for evaluation of matmul error in different pre/post-processing and quantization conditions #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test script for evaluation of matmul error in different pre/post-processing and quantization conditions #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions