Skip to content

Default learning rate and other hyper params.  #6

@msultan

Description

@msultan

Based upon some testing, I am starting to think that the default learning rate of 1e-4 is probably too low for our applications and might be better to bump it up to 5e-3 or even 1e-2. This is mostly based on empirical observations that the higher learning rates tend to get "similar" looking models even with differing architectures, batch sizes, and number of epochs. It also helps that we have the Adam optimizer which can attenuate the rate as training goes forward.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions