Conversation
|
To build: To test: |
|
Nice 👍🏽 This is great. |
|
@YangZhou1997 TCP/EFA support will need integrations into |
|
So far, we keep the engine.h interface, so hopefully uccl_engine.cc will be compatible |
|
NCCL performance is actually extremely high: |
|
I have been playing with TCP support over the break, and now my take is that it is gonna be hard to beat NCCL. There are two reasons:
Because of this, I think we should support TCP in a similar way to TCPX by just layering on top of NCCL. Maybe that would need helps from @DanielDanyang @derekwin Also cc @praveingk @zhongjiechen @MaoZiming |
|
@YangZhou1997 Broadly I Agree.
|
|
@YangZhou1997 I'd like to try implementing TCP based on NCCL, but this time it should be directly under the engine, rather than uccl_engine like TCPX before, correct? |
|
Hi @DanielDanyang , great! Yes, let's try to organically integrate into engine.h/cc. You can create a new branch in uccl and open a PR from there. I would like to keep this pr for the record.
|
|
Hi @praveingk
My understanding is that for TCP, NCCL's solution is already optimal. And it is also pretty challenging to implement efficiently, thus I think we should just use NCCL for TCP transfer. |
|
Thanks @YangZhou1997 for the clarification. |
Description
Please include a summary of the changes and the related issue.
Fixes # (issue)
Type of Change
How Has This Been Tested?
Include any tests here.
Checklist
format.sh.build_and_install.shto verify compilation.