Skip to content

Add old stack logging support to new stack#889

Open
quic-abhamidi wants to merge 1 commit intoquic:ft_experimentalfrom
quic-abhamidi:loggerv2
Open

Add old stack logging support to new stack#889
quic-abhamidi wants to merge 1 commit intoquic:ft_experimentalfrom
quic-abhamidi:loggerv2

Conversation

@quic-abhamidi
Copy link
Copy Markdown

Added the following support for easy visualization of training and validation statistics:
1. train_logger callback function which captures the per epoch time, per epoch loss metric and per epoch perplexity
2. This function also captures number of trainable parameters, number of samples in training and eval dataset
3. All these are logged into a log file which can be given as an input by user by setting the flag --log_file_path in the input config .yaml file.

# Compute perplexity safely
train_metric = None
if train_loss is not None:
train_metric = math.exp(train_loss)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify the train_metric values, check if there is a step wise match, wrt to the old FT stack. Use the same sdk, and same seed and data_seed on both stacks, for reproducibility

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use try block and handle in case metric value overflows

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will add this check

Copy link
Copy Markdown
Contributor

@quic-akuruvil quic-akuruvil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check PP, DDP and Multinode DDP APIs are logging properly, with the new change.

if self.rank != 0:
return
logger.log_rank_zero(text)
with open(self.log_file, "a") as f:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to put inside try block, to catch any write errors

# Compute perplexity safely
train_metric = None
if train_loss is not None:
train_metric = math.exp(train_loss)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use try block and handle in case metric value overflows

from QEfficient.finetune.experimental.core.utils.training_config_utils import prepare_training_config

logger = Logger(__name__)
train_logger = TrainingLogger(rank=0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DDP case, this will fail I think. Please check. I believe we can't hardcode 0 here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change this

# ----------------------------------------------------
# Safe write to log (only rank 0)
# ----------------------------------------------------
def _write(self, text):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually single underscore at the front is for private methods. But _write method is called outside function at finetune_experimental. Please check

 Added the following support for easy visualization of training and validation statistics:
    1. train_logger callback function which captures the per epoch time, per epoch loss metric and per epoch perplexity
    2. This function also captures number of trainable parameters, number of samples in training and eval dataset
    3. All these are logged into a log file which can be given as an input by user by setting the flag --log_file_path in the input config .yaml file.

Signed-off-by: Anusha Bhamidipati <abhamidi@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants