Skip to content

[WIP] Managing initilizations via a role scheme dictionary#1030

Open
memimo wants to merge 7 commits intomila-iqia:masterfrom
memimo:init
Open

[WIP] Managing initilizations via a role scheme dictionary#1030
memimo wants to merge 7 commits intomila-iqia:masterfrom
memimo:init

Conversation

@memimo
Copy link
Contributor

@memimo memimo commented Mar 15, 2016

fixes #970
Re-attempt of #1021
So far it only changes simple.py will propagate changes to other bricks when the format is approved.

if key in self.initialization_schemes:
raise ValueError("All initializations are accepted either"
"through initialization_schemes or "
"correspodong attribute but not both")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also s/attribute/keyword argument/

@dwf
Copy link
Contributor

dwf commented Mar 15, 2016

Technically this is @rizar's review but hopefully my comments from the peanut gallery are a little helpful to at least one of you 😛

@memimo
Copy link
Contributor Author

memimo commented Apr 5, 2016

@rizar could you please take a look at the high level structure of the PR. I'll take care of the comments of dwf when the high-level is approved.

@rizar
Copy link
Contributor

rizar commented Apr 5, 2016

Sure! Will work on it today.

@rizar
Copy link
Contributor

rizar commented Apr 6, 2016

Sorry for the delay, I am struggling with the cluster and my course project. I will definitely review it tomorrow.

@rizar
Copy link
Contributor

rizar commented Apr 7, 2016

I have three suggestions:

  • use the roles as the keys in initialization_schemes
  • initialize parameters using the scheme given for their most specific role
  • check that the given initialization schemes match the parameter roles of the brick in two places: in the end of constructor and in _push_initialization_config. This will allow to detect errors as early as possible.

@memimo
Copy link
Contributor Author

memimo commented Apr 12, 2016

I reorganized the PR a little bit. Again et's discuss the generality of the PR first, I will fix formatting stuff later.

Some notes:

1-The problem with having the roles as keys is that, roles are objects, and different objects of same class will be hashed to different values. So instead I use the name of the roles as the keys.
2-One major API change is that, if someone wants to manually change the schemes self.weights_init and self.biases_init is not valid anymore and one has to use self.initializations_schems
3-There is a tiny bit difference in the GRU weight initialization values now. I don't believe that will have any practical impact on performance, unless someone has an evidence against it.

@memimo memimo force-pushed the init branch 2 times, most recently from 22b5a09 to faf0d27 Compare April 12, 2016 19:27
@rizar
Copy link
Contributor

rizar commented Apr 13, 2016

  1. I think you gave up the idea of a role-indexed dictionary too early. Check https://docs.python.org/2/reference/datamodel.html#object.__hash__ , you should be able to overload this method.
  2. I thought we would support weights_init and biases_init through __getattr__, why not?

@dwf
Copy link
Contributor

dwf commented Apr 13, 2016

Indeed,

def __hash__(self):
    return hash(str(self))

should work in the case of roles.

if self.initialization_schemes is None:
self.initialization_schemes = {}

initialization_to_role = {"weights_init": WEIGHT, 'biases_init': BIAS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, it looks like you have chosen to only handle a handful of <role>_init keyword arguments, not all of them. This is probably good, because this clearly makes initialization_schemes the primary way of defining the initialization schemes.

But the code clearly lacks a check if key not in initialization_to_role. Since now we have a clearly defined white list, I would probably raise a warning, that calling the arguments smth_init is recommended, but still pass this argument to the super().__init__ call.

@rizar
Copy link
Contributor

rizar commented Apr 25, 2016

I think the major design decision that I disagree with is having both initializable_roles and parameter_roles. Neither of the two is fully suitable for controlling how the initialization schemes are spread down the hierarchy. The former is hardcoded, and the latter is known too late, after the parameters are created.

Here is a different design which seems better to me

  • each Initilizable brick has an attribute parameter_roles that contains all the roles that the brick's parameters will have when constructed
  • the value of this attribute is defined as follows:
    • first, Initializable sets it to [WEIGHT, BIAS] (only [WEIGHT] is not use_bias)
    • ... unless the user provides parameter_roles as a constructor argument
    • then, after super(Initializable, self).__init__, Initializable._collect_roles is called to add
      the roles of the children to parameter_roles
  • we are still in Initializable.__init__, but we can already check that initialization_schemes does not contain schemes for the roles that are not in parameter_roles
  • furthermore, when the parameters are created (that is in allocate), we can check that all the parameters have roles that are present in parameter_roles
  • when initialization configuration is pushed (Initializable._push_initialization_config), only the schemes for the relevant roles should be pushed to the children. One should be careful here to check that sometimes initialization schemes for roles that are parents of the precise parameter roles should also be pushed.
  • in initialize, we can repeat the sanity check that we did in constructor, to make sure that user did not provide useless initialization schemes

What do you think?

self.parameter_roles.update(set([role]))

def _initialize(self):
for param in self.parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dangerous code. First, if a parameter has more than one role, it is not clear at all how it should be initialized. I would rather raise an exception upon encountering such a situation. Also, we should choose the most specific initialization scheme (e.g. the one for RECURRENT_WEIGHT and not for just WEIGHT if the parameter is RECURRENT_WEIGHT).

@rizar
Copy link
Contributor

rizar commented May 23, 2016

@memimo , we should resume working on this, and I really hope we can finish it until I leave Montreal on June 8.

@memimo memimo force-pushed the init branch 3 times, most recently from 893a301 to 45d4456 Compare June 1, 2016 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metaissue: Improve initialization

3 participants