Although Mooncake PG now supports scaling up, I found that Mooncake EP has significant limitations on scaling up when reading through the code. For example, rank_num must be divisible by max_qp_num (256). When I launch 16 ranks and try to add one or a few more ranks, it triggers an assertion failure. Are there any plans to optimize this in the future? I'm particularly interested in scaling up capabilities at the single-rank or single-node level.
Looking forward to a response from the Mooncake EP or PG maintainers. Thanks!
Before submitting a new issue...