π The feature, motivation and pitch
As the kernels seem to be limited to the FP32 data type at the moment, it would be immensely helpful to have the implementations support mixed precision computations (FP16 and BF16) as well. This would be helpful for broader ranging applications in NLP, not just in graph neural nets.
How involved would enabling mixed-precision computations be? Any pointers to potentially start a PR?
Alternatives
No response
Additional context
No response