Skip to content

Conversation

@O-J1
Copy link
Collaborator

@O-J1 O-J1 commented Jan 5, 2026

We just need to decide on 'high' or 'medium'. Habishi reccomended medium. Highest is exceedingly slow.

https://docs.pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html

@dxqb
Copy link
Collaborator

dxqb commented Jan 6, 2026

summary as discussed on Discord:

  • 'medium' is bfloat16 precision, which the user can do by setting this train dtype. If he did set float32 as train dtype, float32 should be used, not a lower precision

  • 'high' seems interesting and might be a precision we don't support yet:

“high”, float32 matrix multiplications either use the TensorFloat32 datatype (10 mantissa bits explicitly stored) or treat each float32 number as the sum of two bfloat16 numbers (approximately 16 mantissa bits with 14 bits explicitly stored), if the appropriate fast matrix multiplication algorithms are available. Otherwise float32 matrix multiplications are computed as if the precision is “highest”. See below for more information on the bfloat16 approach.

I'm not sure if this is better than tfloat32, which we do support. If we want to support this 2xbfloat16 algorithm, it should be a separate setting the user chooses.

@dxqb dxqb added the followup Failure to provide config or other info or needs followup label Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Effort: Low followup Failure to provide config or other info or needs followup

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants