-
Notifications
You must be signed in to change notification settings - Fork 10
PSGD-Kron-Pro(crustes) optimizer implementation #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSGD-Kron-Pro(crustes) optimizer implementation #60
Conversation
…contraction Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
|
/ok to test 780e3d7 |
skyw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Comments are mostly style, although must fix.
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
|
@https://github.com/lixilinx does this look good to you? |
|
Thanks, @mkhona-nvidia for the PSGD code. It looks good and well organized to me! I once verified the correctness of psgd_kron_contractions by comparison with einsum. In the norm_lower_bound_spd, we will set the default subspace dim to 32 for float32 (based on my test). |
Signed-off-by: mikail <[email protected]>
… fp32 Signed-off-by: mikail <[email protected]>
Signed-off-by: mikail <[email protected]>
The momentum dampening has also been changed to: Dampened momentum calculation:dampened_momentum = exp_avg + (
damping_noise_scale + torch.finfo(exp_avg.dtype).eps * exp_avg.abs()
) * torch.randn_like(exp_avg)with a |
|
/ok to test f9f12bd |
Signed-off-by: mikail <[email protected]>
|
/ok to test 54220e2 |
This builds on previous PRs for PSGD's helper functions to make the PSGD-Kron-Pro optimizer