-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
Does anyone know the correct way to define fma
for BFloat16
? Based on my understanding of the double-rounding theorems, I think it should be correct to simply cast to Float32
and back:
@inline Base.fma(x::BFloat16, y::BFloat16, z::BFloat16) =
BFloat16(fma(Float32(x), Float32(y), Float32(z)))
But I've observed that this sometimes returns different results from BFloat16(Float32(x) * Float32(y) + Float32(z))
, which puzzles me because I think they ought to be equivalent in round-to-nearest-even (provided that Float32
has more than twice plus two the precision of BFloat16
, which is true).
Metadata
Metadata
Assignees
Labels
No labels