Skip to content

deepseek-moe-16-base fails on hellaswag dataset #972

@torsli

Description

@torsli

Describe the bug

deepseek-moe-16-base fails when I try to pre-train or finetune it with the hellaswag dataset. It works when I use llm.mock.build_unpacked_dataset, which doesn't automatically generate a loss_mask for the batch.

<class 'transformers_modules.deepseek_hyphen_ai.deepseek_hyphen_moe_hyphen_16b_hyphen_base.521d2bc4fb69a3f3ae565310fcc3b65f97af2580.modeling_deepseek.DeepseekForCausalLM'>
[rank0]: Traceback (most recent call last):
[rank0]:     trainer.run_train_validation_loop()
[rank0]:   File "/opt/Automodel/nemo_automodel/recipes/llm/train_ft.py", line 1135, in run_train_validation_loop
[rank0]:     train_log_data = self._run_train_optim_step(batches, self.max_grad_norm)
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/Automodel/nemo_automodel/recipes/llm/train_ft.py", line 1285, in _run_train_optim_step
[rank0]:     self._forward_backward_step(
[rank0]:   File "/opt/Automodel/nemo_automodel/recipes/llm/train_ft.py", line 1243, in _forward_backward_step
[rank0]:     out = model(**batch)
[rank0]:           ^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/Automodel/nemo_automodel/_transformers/auto_model.py", line 91, in wrapper
[rank0]:     return func(self, *args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: TypeError: DeepseekForCausalLM.forward() got an unexpected keyword argument 'loss_mask'

Steps/Code to reproduce bug

Run TrainFinetuneRecipeForNextTokenPrediction recipe with attached config.
deepseek-moe-16b-base.yaml

Additional context

Deepseek-moe model code

The error above occurs when I use nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained as the model __target__.
I've also tried usingnemo_automodel.components.models.deepseek_v3.model.DeepseekV3ForCausalLM.from_config but that fails because deepseek-moe-16b-base does have an n_group attribute in the config, and that class expects it

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions