We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent e3398c7 commit 3b56b70Copy full SHA for 3b56b70
docs/changelogs/v3.0.1.md
@@ -8,6 +8,8 @@
8
* support not-using-first-momentum when beta1 is not given
9
* default dtype for first momentum to `bfloat16`
10
* clip second momentum to 0.999
11
+* Implement `GrokFast` optimizer. (#244, #245)
12
+ * [Accelerated Grokking by Amplifying Slow Gradients](https://arxiv.org/abs/2405.20233)
13
14
### Bug
15
0 commit comments