Default learning rate and other hyper params. 

Based upon some testing, I am starting to think that the default learning rate of 1e-4 is probably too low for our applications and might be better to bump it up to 5e-3 or even 1e-2. This is mostly based on empirical observations that the higher learning rates tend to get "similar" looking models even with differing architectures, batch sizes, and  number of epochs. It also helps that we have the Adam optimizer which can attenuate the rate as training goes forward. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Default learning rate and other hyper params. #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Default learning rate and other hyper params. #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions