🚀 The feature, motivation and pitch
Currently, we have one all_reduce in TEP mode (TP for attention, TP+EP in MoE). Let's investigate current optimal configuration in PT manual backend to better understand if our TEP performance is en par with current PT backend
Alternatives
No response
Additional context
No response
Before submitting a new issue...