-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Issue
I would like to use fms-hf-tuning to collect system level metrics while finetuning models, these include model load time, as well as device related metrics (like these that AIM collects).
One way would be to rely on just the measurements that AIM collects by pointing fms-hf-tuning to an AIM server then contacting AIM to retrieve the data. However, this is a bit restricting in that we cannot collect custom data, collect system metrics at a period other than the one that AIM is using - 30 seconds, or collect data at all if we don't spin up an AIM server.
A more convenient solution would be to allow providing an optional parameter to train() for which can contain a list of callbacks
fms-hf-tuning/tuning/sft_trainer.py
Lines 29 to 34 in fc07060
| def train( | |
| model_args: configs.ModelArguments, | |
| data_args: configs.DataArguments, | |
| train_args: configs.TrainingArguments, | |
| peft_config: Optional[Union[peft_config.LoraConfig, peft_config.PromptTuningConfig]] = None, | |
| ): |
In the same spirit I'd like to get access to the TrainingOutput object that sft_trainer.train() returns here (input_tokens_per_second, train_runtime, etc) :
fms-hf-tuning/tuning/sft_trainer.py
Line 168 in fc07060
| trainer.train() |
(just by returning the output of trainer.train() as the output of train()).
Done when
- support collecting custom metrics via custom callbacks
- return the output of
trainer.train()to the caller oftrain()