Add old stack logging support to new stack#889
Add old stack logging support to new stack#889quic-abhamidi wants to merge 1 commit intoquic:ft_experimentalfrom
Conversation
| # Compute perplexity safely | ||
| train_metric = None | ||
| if train_loss is not None: | ||
| train_metric = math.exp(train_loss) |
There was a problem hiding this comment.
Verify the train_metric values, check if there is a step wise match, wrt to the old FT stack. Use the same sdk, and same seed and data_seed on both stacks, for reproducibility
There was a problem hiding this comment.
Also use try block and handle in case metric value overflows
| if self.rank != 0: | ||
| return | ||
| logger.log_rank_zero(text) | ||
| with open(self.log_file, "a") as f: |
There was a problem hiding this comment.
It would be better to put inside try block, to catch any write errors
| # Compute perplexity safely | ||
| train_metric = None | ||
| if train_loss is not None: | ||
| train_metric = math.exp(train_loss) |
There was a problem hiding this comment.
Also use try block and handle in case metric value overflows
| from QEfficient.finetune.experimental.core.utils.training_config_utils import prepare_training_config | ||
|
|
||
| logger = Logger(__name__) | ||
| train_logger = TrainingLogger(rank=0) |
There was a problem hiding this comment.
In DDP case, this will fail I think. Please check. I believe we can't hardcode 0 here.
| # ---------------------------------------------------- | ||
| # Safe write to log (only rank 0) | ||
| # ---------------------------------------------------- | ||
| def _write(self, text): |
There was a problem hiding this comment.
Usually single underscore at the front is for private methods. But _write method is called outside function at finetune_experimental. Please check
Added the following support for easy visualization of training and validation statistics:
1. train_logger callback function which captures the per epoch time, per epoch loss metric and per epoch perplexity
2. This function also captures number of trainable parameters, number of samples in training and eval dataset
3. All these are logged into a log file which can be given as an input by user by setting the flag --log_file_path in the input config .yaml file.
Signed-off-by: Anusha Bhamidipati <abhamidi@qti.qualcomm.com>
Added the following support for easy visualization of training and validation statistics:
1. train_logger callback function which captures the per epoch time, per epoch loss metric and per epoch perplexity
2. This function also captures number of trainable parameters, number of samples in training and eval dataset
3. All these are logged into a log file which can be given as an input by user by setting the flag --log_file_path in the input config .yaml file.