[QUESTION] glu activation with tensor parallel in GroupedMLP

**Description:**

When training with GroupedMLP and Tensor Parallel (TP) enabled, and `gated_linear_unit` is activated, the activation function is applied to fc1_output. Assuming a TP degree of 2, this intermediate output only contains half of the information as it holds the tensor values on one TP rank. Applying the GLU activation function on this output leads to a loss of information because only half of the tensor values are involved in the activation function.

Specifically, in the GLU function (https://github.com/NVIDIA/Megatron-LM/blob/core_v0.7.0/megatron/core/transformer/moe/experts.py#L48):
`self.config.activation_func(x[0]) * x[1]`

Both self.config.activation_func(x[0]) and x[1] contain half of the output tensor due to TP being enabled, resulting in an output that does not match the results from training without TP.

**Steps to Reproduce:**
1. Enable gated_linear_unit in the GroupedMLP configuration.
2. Train the model with Tensor Parallel (TP) enabled.
3. Compare the intermediate outputs of the GLU activation function with and without TP enabled. (https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/transformer/moe/experts.py#L176)

**Expected Behavior:**

The activation function should correctly handle the tensor values across all TP ranks to prevent any loss of information, ensuring consistency with results obtained without TP.

**Actual Behavior:**

The GLU activation function is applied to tensor values that only represent half of the full tensor due to TP, leading to inconsistent results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] glu activation with tensor parallel in GroupedMLP #985

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QUESTION] glu activation with tensor parallel in GroupedMLP #985

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions