forked from mlc-ai/mlc-llm
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Develop python script in repo which should do and test the following (see also below in the thread):
- Base scenario:
- Create two matrices (X, W) filled by floating point values with predefined distribution.
- square matrices with size 1024*1024
- value type is float16
- develop flexible solution with possibility to change size and data type
- Multiply the matrices (X * W = Y) and save result (Y, original one)
- Preprocess matrices (X -> X', W -> W')
- develop flexible solution with possibility to change preprocessing type
- Multiply the preprocessed matrices (X' * W' = Y') and save result (Y', preprocessed one)
- Fake-quantize the preprocessed matrices (Q(W') or/and Q(X'))
- at least one matrix should be quantize, but it can be only one
- develop flexible solution with possibility to change quantization type
- Multiply the quantized matrices and save result (Yq, quantized one)
- Postprocess the quantized result (Yq -> Yp)
- Find differences between pre/postprocessed, quantized and original results. Use metrics for the obtained matrices as final evaluation value
- Data distribution
- in general it assumes combination of two normal distributions (context and outliers) for both X and W matrices
- The following parameters are used to control distribution:
- context average value
- context dispersion
- number of context values
- distance between context and outliers or outliers average value
- outliers dispersion
- number of outliers values
- Start simplification: W has context only. outliers dispersion is much less than context one (e.g. Do = 0.1 * Dc). Outliers number is much less matrix size (e.g. No ~ 0.1 * sqrt(Nc)). Context average value = 0.
- Preprocessing
- Smoothing from SmoothQuant
- AWQ algorithm
- Quantization
- symmetric per-tensor int8
- symmetric per-channel int8
- symmetric per-group int8
- 32
- 64
- 128
- asymmetric per-tensor int8
- asymmetric per-channel int8
- asymmetric per-group int8
- 32
- 64
- 128
- GPTQ-like
- int8
- int4
- int3
- Postprocessing
- First step: no postprocessing
- compensate error by bias
- Metrics
- First step: Use Frobenius norm (LF) for error evaluation (see here or Russian version)
- Study different matrix norm and analyze do they give us correct metrics.
- Statistics scenario
- Use base scenario for set of matrices (e.g. 100) with the same distributions and collect error statistics (mean and std)
- Calibration scenario
- Use set of matrices (e.g. 100) with the same distributions for parameters calibration and evaluate error statistics on other set of matrices (e.g. 100) with the same distributions
- Additional features:
- use matrix from dump
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels