Skip to content

Using DTensor to handle local num_heads change while TP is applied (#… #25

Using DTensor to handle local num_heads change while TP is applied (#…

Using DTensor to handle local num_heads change while TP is applied (#… #25