We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent d9212c2 commit 7f7cbe2Copy full SHA for 7f7cbe2
README.md
@@ -26,6 +26,7 @@ This improvement in training speed has been brought about by the following techn
26
* Cautious Weight Decay w/ schedule tied to LR
27
* Exponential decay of residual stream
28
* Batch size schedule
29
+* Max seq length schedule
30
* Partial Key Offset
31
* Multi token prediction
32
* Untie embed and lm_head at 2/3 of training
0 commit comments