Skip to content

Commit 3fb4161

Browse files
authored
Update quantize.cpp
The new imatrix GGUF format stores per-matrice token counts instead of per-tensor chunk counts which makes it possible to fix NaN's for low bits per wight quants
1 parent eef000f commit 3fb4161

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

tools/quantize/quantize.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -289,7 +289,7 @@ static int load_imatrix(const std::string & imatrix_file, std::vector<std::strin
289289
const float count = ((const float *) counts->data)[j];
290290
if (count > 0.0f) {
291291
for (int64_t i = 0; i < ne0; ++i) {
292-
e[j*ne0 + i] = ((const float *) sums->data)[j*ne0 + i] / count;
292+
e[j*ne0 + i] = (((const float *) sums->data)[j*ne0 + i] + 1.0f) / (count + 1.0f);
293293
}
294294
} else {
295295
// Partial imatrix data, this tensor never got any input during calibration

0 commit comments

Comments
 (0)