Skip to content
Discussion options

You must be logged in to vote

The problem seems to be related to "parameters" getting added to the model after the first iteration. E.g the qt_vals in ConvVQ.

In general it looks like you use a lot of state inside the modules which should not be treated as parameters. You can prefix those with _ so that they are not picked up in the Module's parameters. For example use _qt_vals instead of qt_vals. And when you keep track of all the loss values in the modules use an _ as a prefix in the name to avoid treating them as parameters.

Slightly more detailed explanation:

  • When you take the first optimizer update, gradient state is initialized for all of the models parameters.
  • After this, the optimizer is considered initialized

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@credwood
Comment options

Answer selected by credwood
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants