We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 7f5c0b1 commit dc90ba8Copy full SHA for dc90ba8
_posts/2025-11-25-d2l_optimization.md
@@ -408,7 +408,7 @@ Adadelta 是 AdaGrad 的变体,减少学习率对坐标的适应性
408
- 调整梯度:
409
410
$$
411
- \mathbf{g}_t' = \frac{\sqrt{\Delta\mathbf{x}_{t-1} + \epsilon}}{\sqrt{{\mathbf{s}_t + \epsilon}}} \odot \mathbf{g}_t
+ \mathbf{g}_t' = \frac{\sqrt{\Delta\mathbf{x}_{t-1} + \epsilon}}{\sqrt{\mathbf{s}_t + \epsilon}} \odot \mathbf{g}_t
412
413
414
($\epsilon$ 为小值,如 1e-5,保证数值稳定)
0 commit comments