deepseek-moe-16-base fails on hellaswag dataset

**Describe the bug**

deepseek-moe-16-base fails when I try to pre-train or finetune it with the hellaswag dataset. It works when I use  `llm.mock.build_unpacked_dataset`, which doesn't automatically generate a `loss_mask` for the batch.

```
<class 'transformers_modules.deepseek_hyphen_ai.deepseek_hyphen_moe_hyphen_16b_hyphen_base.521d2bc4fb69a3f3ae565310fcc3b65f97af2580.modeling_deepseek.DeepseekForCausalLM'>
[rank0]: Traceback (most recent call last):
[rank0]:     trainer.run_train_validation_loop()
[rank0]:   File "/opt/Automodel/nemo_automodel/recipes/llm/train_ft.py", line 1135, in run_train_validation_loop
[rank0]:     train_log_data = self._run_train_optim_step(batches, self.max_grad_norm)
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/Automodel/nemo_automodel/recipes/llm/train_ft.py", line 1285, in _run_train_optim_step
[rank0]:     self._forward_backward_step(
[rank0]:   File "/opt/Automodel/nemo_automodel/recipes/llm/train_ft.py", line 1243, in _forward_backward_step
[rank0]:     out = model(**batch)
[rank0]:           ^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/Automodel/nemo_automodel/_transformers/auto_model.py", line 91, in wrapper
[rank0]:     return func(self, *args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: TypeError: DeepseekForCausalLM.forward() got an unexpected keyword argument 'loss_mask'
```
**Steps/Code to reproduce bug**

Run TrainFinetuneRecipeForNextTokenPrediction recipe with attached config.
[deepseek-moe-16b-base.yaml](https://github.com/user-attachments/files/24199137/deepseek-moe-16b-base.yaml)


**Additional context**

[Deepseek-moe model code](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py)

The error above occurs when I use `nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained` as the model `__target__`. 
I've also tried using`nemo_automodel.components.models.deepseek_v3.model.DeepseekV3ForCausalLM.from_config` but that fails because deepseek-moe-16b-base does have an `n_group` attribute in the config, and that class expects it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

deepseek-moe-16-base fails on hellaswag dataset #972

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

deepseek-moe-16-base fails on hellaswag dataset #972

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions