Skip to content

Using DTensor to handle local num_heads change while TP is applied #1447

Using DTensor to handle local num_heads change while TP is applied

Using DTensor to handle local num_heads change while TP is applied #1447