Skip to content

Commit 59af503

Browse files
committed
Update README.md
1 parent c5ecdaa commit 59af503

File tree

1 file changed

+19
-17
lines changed

1 file changed

+19
-17
lines changed

tools/imatrix/README.md

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,19 @@ The parameters in square brackets are optional and have the following meaning:
2020
* `-lv | --verbosity` specifies the verbosity level. If set to `0`, no output other than the perplexity of the processed chunks will be generated. If set to `1`, each time the results are saved a message is written to `stderr`. If `>=2`, a message is output each time data is collected for any tensor. Default verbosity level is `1`.
2121
* `-o | --output-file` specifies the name of the file where the computed data will be stored. If missing `imatrix.gguf` is used.
2222
* `-ofreq | --output-frequency` specifies how often the so far computed result is saved to disk. Default is 10 (i.e., every 10 chunks)
23-
* `--output-format` specifies the output format of the generated imatrix file. Either "gguf", or "dat" (the legacy format). Defaults to "gguf".
23+
* `--output-format` specifies the output format of the generated imatrix file. Either `gguf`, or `dat` (the legacy format). Defaults to `gguf`.
2424
* `--save-frequency` specifies how often to save a copy of the imatrix in a separate file. Default is 0 (i.e., never)
2525
* `--process-output` specifies if data will be collected for the `output.weight` tensor. Typically, it is better not to utilize the importance matrix when quantizing `output.weight`, so this is set to `false` by default.
2626
* `--in-file` one or more existing imatrix files to load and combine. Useful for merging files from multiple runs/datasets.
2727
* `--parse-special` enables parsing of special tokens (e.g., `<|im_start|>` in some models). Useful for models with custom tokenizers.
2828
* `--chunk | --from-chunk` to skip the first `n` chunks of tokens from the input data. Useful for resuming or skipping initial low-quality data.
29-
* `--chunks` maximum number of chunks to process. Default is -1 for all available chunks.
29+
* `--chunks` maximum number of chunks to process. Default is `-1` for all available chunks.
3030
* `--no-ppl` disables the calculation of perplexity for the processed chunks. Useful if you want to speed up the processing and do not care about perplexity.
3131
* `--show-statistics` displays imatrix file's statistics.
3232

3333
For faster computation, make sure to use GPU offloading via the `-ngl | --n-gpu-layers` argument.
3434

35-
Recent versions of `llama-imatrix` store data in GGUF format by default. For the legacy format, use an extension other than `.gguf` when saving the output file. More information is available in <https://github.com/ggml-org/llama.cpp/pull/9400>.
35+
Versions **b5942** and newer of `llama-imatrix` store data in GGUF format by default. For the legacy format, use `--output-format dat` when saving the output file. More information is available in <https://github.com/ggml-org/llama.cpp/pull/9400>.
3636

3737
## Examples
3838

@@ -74,25 +74,27 @@ Recent versions of `llama-imatrix` store data in GGUF format by default. For the
7474
./llama-imatrix --in-file imatrix.gguf --show-statistics
7575
```
7676

77-
`--show-statistics` will display the following statistics:
77+
## Statistics
78+
79+
From version <bwxyz>, `--show-statistics` operates in two modes: for GGUF (preferred) imatrices, it reports direct and accurate activation statistics, and for legacy (binary) files, it reports the less precise average squared activations.
7880

7981
#### Per tensor
8082

81-
* Σ(Act²): sum of all squared activations (the importance scores)
82-
* Min & Max: minimum and maximum squared activations values
83-
* μ & σ: Squared activations' mean and standard deviation
84-
* % Active: proportion of elements whose average squared activation exceeds a small threshold (1e-5). Helpful to determine how alive/dormant the tensor is during inference
85-
* N: number of squared activations
86-
* Entropy: entropy of the squared activation distribution, in bits (standard Shannon entropy measurement) $S = -\sum_{i=1}^N p_i \log_2 p_i$
87-
* E (norm): Normalized entropy. $E(norm)=\frac{-\sum_{i=1}^N p_i \log_2 p_i}{log_2 N}$. These two metrics can be used to determine how well a prompt "exercises" the model's capabilities
88-
* ZD Score: z-score distribution as described in _3.1 Layer Importance Scores_ of [Layer-Wise Quantization](https://arxiv.org/abs/2406.17415)
89-
* CosSim: cosine similarity with respect to the previous layer's tensor. Useful to determine how similar the squared activations of the current layer are to the previous layer's squared activations.
83+
* **Σ(Act²)** *(legacy mode)* / **L₂ Norm** *(preferred)*: If in legacy mode, the raw sum of squares of activations (sum of `Act²`). In preferred mode, the Euclidean Distance (L₂ Norm) between this tensor’s average activations and those of the previous layer.
84+
* **Min / Max / μ / σ**: Tensor elements Min, Max, Mean, and Standard Deviation.
85+
* **N**: Number of tensor elements considered.
86+
* **H Norm**: Shannon Entropy normalized over log₂(N). Defined as $H Norm=\frac{-\sum_{i=1}^N p_i \log_2 p_i}{log_2 N}$. Used to determine how well a prompt "exercises" the model's capabilities.
87+
* **H** *(legacy mode)* / **ECS** *(preferred)*: If legacy, Shannon Entropy defined as $H = -\sum_{i=1}^N p_i \log_2 p_i$. If preferred, *Euclidean-Cosine Score* defined as $ECS = K \cdot e^{-\alpha a} \cdot |b|^{\gamma}$ where `a = L₂ Norm`, `b = Cosine Similarity`, `α = -0.01`, `γ = 10` between this tensor’s elements and those of the previous layer. Higher score means more similarity and lower change.
88+
* **ZD**: % of elements whose Z-score is > 1.0 in magnitude (an indicator of outliers), as described in _3.1 Layer Importance Scores_ of [Layer-Wise Quantization](https://arxiv.org/abs/2406.17415)
89+
* **CosSim**: Cosine Similarity between this tensor’s elements and those of the previous layer.
9090

9191
#### Per layer
9292

93-
Weighted averages of Σ(Act²), ZD Score and CosSim are also calculated.
93+
Aggregated metrics per block/layer:
9494

95-
#### Important note on the computed Statistics
95+
* **Σ(Act²)** *(legacy mode)* / **L₂ Norm** *(preferred)*: If in legacy mode, the sum of squared activations (sum of Act²) for the layer's concatenated tensors. In preferred mode, the Euclidean Distance (L₂ Norm) between this layer's average concatenated tensor activations the previous layer.
96+
* **ZD**: % of this layer's concatenated tensors' elements with |Z| > 1.
97+
* **CosSim**: Cosine Similarity between this layer's concatenated tensors' elements compared and the previous layer’s.
98+
* **ECS** *(preferred only)*: Euclidean-Cosine Score applied to the layer.
9699

97-
When using these statistics, please note that they are computed on the squared activations, **not on the actual (raw) activations**.
98-
Whilst the results are still useful, they're less realiable than using the raw values, and in the case of the cosine similarity, could be misleading if the tensor contains opposite vectors.
100+
More information is available in https://github.com/ggml-org/llama.cpp/pull/14891

0 commit comments

Comments
 (0)