Saving hyperparameters vs checkpointing #15864

mfoglio · 2022-11-29T21:26:54Z

mfoglio
Nov 29, 2022

I am building an image classifier with multiple heads. Each head predict a categorial value.
Let's assume that we want to classify fruits and their color. We will have an head that predict the name of the fruit ( apple, pear, grapes) and another head that predicts the color (red, green, yellow).
The labels of both the attributes (fruit name and fruit color) are stored in a configuration files. At runtime, the a MultiOutputClassifier (see below) is initialized withinMultiOutputClassifierModule reflecting the number of attributes and their labels. In other words, the output of the network is not hardcoded, but it defined according to a configuration file.

class MultiOutputClassifier(nn.Module):

    def __init__(self, attributes: List[Attribute]):
        super(MultiOutputClassifier, self).__init__()
        self.attributes = attributes
        self.layers = .... # define accordingly to attributes

    def forward(self, x):
        return self.layers(x)

The class MultiOutputClassifier (not the instance) is passed to a LightningModule during its initialization:

class MultiOutputClassifierModule(pl.LightningModule):

    def __init__(
            self,
            net:  Type[nn.Module], # this is the class MultiOutputClassifier  (not an instance)
            optimizer: torch.optim.Optimizer,
            scheduler: torch.optim.lr_scheduler,
            attributes: List[Attribute]
    ):
        super(VehiclesModule, self).__init__()
        self.save_hyperparameters(logger=False)
        # initialize the net
         self.net = net(attributes=attributes)

I am doing this to speed up experimentation: I can define a different model architecture MultiOutputClassifierV2 and pass it to the same MultiOutputClassifierModule. In brief, this structure allows me to try multiple models by simply passing them to the torch lightning module.
Everything works great, but I have issues understanding which hyperparameters I should save.
If I simply use self.save_hyperparameters(logger=False) everything seems to work great: the checkpoint is saved, and I can load it with MultiOutputClassifierModule.load_from_checkpoint(checkpoint_file_path); however I get the warning:

UserWarning: Attribute 'net' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['net'])`.

However, if I use self.save_hyperparameters(ignore=['net']), then I cannot use MultiOutputClassifierModule.load_from_checkpoint(checkpoint_file_path) because it will expect a parameter net as well. However, it doesn't make sense to me to initialize pass the parameter net aside since a specific network was initialized during the previous training based on a specific configuration file.

My questions are:

What does self.save_hyperparameters() exactly does? What python attributes is it saving exactly? Will these attributes be saved just at initialization or every time a checkpoint is saved? I am assuming it saves the kwargs received by the lightning module. Let me know if this is correct.
What approach should I use? Using self.save_hyperparameters() seems to work better, but I don't understand why torch lightning suggests the other option (i.e. self.save_hyperparameters(ignore=['net'])).
What is difference between using the two? Would the two checkpoints really contain the same data? The warning says that an instance of nn.Module is already saved during checkpointing. If I understand correctly, when doing self.save_hyperparameters() I am saving the initial (untrained) net as an hyperparameter, and then I am saving again its trained weights inside the state_dict. Is this correct?
Thank you

EDIT: is it possible that my problem arises from the fact that I have a kwarg net which should be a class, and then I have an attribute called net (same name) which instead is an instance of that class? I am wondering if this could be the reason torch lightning is raising the warning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Saving hyperparameters vs checkpointing #15864

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Saving hyperparameters vs checkpointing #15864

Uh oh!

Uh oh!

mfoglio Nov 29, 2022

Replies: 0 comments

mfoglio
Nov 29, 2022