You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here `-m | --model` with a model name and `-f | --file` with a file containing calibration data (such as e.g. `wiki.train.raw`) are mandatory.
@@ -29,6 +29,7 @@ The parameters in square brackets are optional and have the following meaning:
29
29
*`--chunks` maximum number of chunks to process. Default is `-1` for all available chunks.
30
30
*`--no-ppl` disables the calculation of perplexity for the processed chunks. Useful if you want to speed up the processing and do not care about perplexity.
*`--activation-statistics` enables the collection of activation statistics for each tensor. If set, the imatrix file size will double, but reported statistics will be more accurate.
32
33
33
34
For faster computation, make sure to use GPU offloading via the `-ngl | --n-gpu-layers` argument.
34
35
@@ -69,14 +70,19 @@ Versions **b5942** and newer of `llama-imatrix` store data in GGUF format by def
From version <bwxyz>, `--show-statistics`operates in two modes: for GGUF (preferred) imatrices, it reports direct and accurate activation statistics, and for legacy (binary) files, it reports the less precise average squared activations.
85
+
Beginning with version <bwxyz>, `--show-statistics`has two modes. If `--activation-statistics` was used at imatrix creation time and `--output-format` was set to `gguf`, it reports precise statistics. Otherwise, it reports less accurate, albeit still useful, metrics based on average squared activations.
0 commit comments