Your question
I have a quesstion about create the pp groups when enable context_parallel_size > 1 and encoder_tensor_parallel_size != tensor_parallel_size.
When enabling context_parallel, the input will be splited symmetrically in order to balance calculation. Using zip(cycle(e_ranks), d_ranks) is not right.
|
# Map 1 encoder tp rank to several decoder tp ranks, because |