Precision vs Faithfullness #15251

apaz-cli · 2025-08-11T20:36:29Z

apaz-cli
Aug 11, 2025

I was reading the implementation of MXFP4/INT8 dot product for gpt-oss, and it occurred to me. Since it's memory bound, we could get free precision by accumulating in fp64, or using kahan summation, or both, and it wouldn't be any slower. This is probably the case every place there's an accumulate.

Which does llama.cpp care about more, implementing the models faithfully, or getting extra free precision for better outputs?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Precision vs Faithfullness #15251

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Precision vs Faithfullness #15251

Uh oh!

apaz-cli Aug 11, 2025

Replies: 0 comments

apaz-cli
Aug 11, 2025