Skip to content
Discussion options

You must be logged in to vote

A condition where this timeout occurs: monitoring model weights and biases using

import smdebug.pytorch as smd
hook=smd.get_hook(create_if_not_exists=True)
hook.register_module(model)

These are getting saved to S3 as TensorBoard metrics. I wonder if the network activity is somehow blocking the retrieval of the monai metric.

update: reducing the rate at which i'm logging data during the validation step appears to resolve this.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@csheaff
Comment options

@csheaff
Comment options

Answer selected by wyli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants