-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Describe the bug
Exception CheckpointingConfig.__init__() got an unexpected keyword argument 'best_metric_key' when trying to set best_metric_key in config.
In https://github.com/NVIDIA-NeMo/Automodel/blob/main/nemo_automodel/recipes/llm/train_ft.py#L1044,
best_metric_key is read from configuration (e.g. to select a best metric when computing validation loss on multiple validation sets) like this:
self.best_metric_key = self.cfg.get("checkpoint.best_metric_key", "default")
However, this conflicts with the dataclass definition in https://github.com/NVIDIA-NeMo/Automodel/blob/main/nemo_automodel/components/checkpoint/checkpointing.py#L78, which does not support the best_metric_key key/parameter.
Adding the param to the config like
checkpoint:
enabled: true
...
best_metric_key: my_best_metric
leads to exception CheckpointingConfig.__init__() got an unexpected keyword argument 'best_metric_key'.
Amending the dataclass with the best_metric_key would be a quick fix, but unclear if that is a desired solution.
Steps/Code to reproduce bug
Try using best_metric_key in config like this:
checkpoint:
enabled: true
...
best_metric_key: my_best_metric