-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
deepseek-moe-16-base fails when I try to pre-train or finetune it with the hellaswag dataset. It works when I use llm.mock.build_unpacked_dataset, which doesn't automatically generate a loss_mask for the batch.
<class 'transformers_modules.deepseek_hyphen_ai.deepseek_hyphen_moe_hyphen_16b_hyphen_base.521d2bc4fb69a3f3ae565310fcc3b65f97af2580.modeling_deepseek.DeepseekForCausalLM'>
[rank0]: Traceback (most recent call last):
[rank0]: trainer.run_train_validation_loop()
[rank0]: File "/opt/Automodel/nemo_automodel/recipes/llm/train_ft.py", line 1135, in run_train_validation_loop
[rank0]: train_log_data = self._run_train_optim_step(batches, self.max_grad_norm)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/Automodel/nemo_automodel/recipes/llm/train_ft.py", line 1285, in _run_train_optim_step
[rank0]: self._forward_backward_step(
[rank0]: File "/opt/Automodel/nemo_automodel/recipes/llm/train_ft.py", line 1243, in _forward_backward_step
[rank0]: out = model(**batch)
[rank0]: ^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/Automodel/nemo_automodel/_transformers/auto_model.py", line 91, in wrapper
[rank0]: return func(self, *args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: TypeError: DeepseekForCausalLM.forward() got an unexpected keyword argument 'loss_mask'
Steps/Code to reproduce bug
Run TrainFinetuneRecipeForNextTokenPrediction recipe with attached config.
deepseek-moe-16b-base.yaml
Additional context
The error above occurs when I use nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained as the model __target__.
I've also tried usingnemo_automodel.components.models.deepseek_v3.model.DeepseekV3ForCausalLM.from_config but that fails because deepseek-moe-16b-base does have an n_group attribute in the config, and that class expects it
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working