We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 05e54a6 commit 4104842Copy full SHA for 4104842
emerging_optimizers/utils/modules.py
@@ -35,6 +35,11 @@ class Conv1dFlatWeights(nn.Conv1d):
35
36
Arguments are the same as ::class:`torch.nn.Conv1d`.
37
38
+ Note:
39
+ This implementation potentially introduces a small overhead because of split weights can combining gradients
40
+ of it. This should be trivial compared to computational cost of LLM training. If it becomes a concern, a
41
+ kernel can be developed to eliminate the overhead.
42
+
43
Note:
44
Similar flattening logic can be applied to N-D convolution. But since we don't have use cases of them in LLM
45
yet, they are not supported despite the __init__() function is generalized enough to support N-D convolution.
0 commit comments