We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 2efb2cc commit 7f5c0b1Copy full SHA for 7f5c0b1
_posts/2025-11-25-d2l_optimization.md
@@ -405,7 +405,13 @@ Adadelta 是 AdaGrad 的变体,减少学习率对坐标的适应性
405
406
- $\mathbf{s}_t = \rho \mathbf{s}_{t-1} + (1 - \rho) \mathbf{g}_t^2$($\rho$ 为超参数)
407
408
-- 调整梯度:$\mathbf{g}_t' = \frac{\sqrt{\Delta\mathbf{x}_{t-1} + \epsilon}}{\sqrt{{\mathbf{s}_t + \epsilon}}} \odot \mathbf{g}_t$($\epsilon$ 为小值,如 1e-5,保证数值稳定)
+- 调整梯度:
409
+
410
+ $$
411
+ \mathbf{g}_t' = \frac{\sqrt{\Delta\mathbf{x}_{t-1} + \epsilon}}{\sqrt{{\mathbf{s}_t + \epsilon}}} \odot \mathbf{g}_t
412
413
414
+ ($\epsilon$ 为小值,如 1e-5,保证数值稳定)
415
416
- 参数更新:$\mathbf{x}_t = \mathbf{x}_{t-1} - \mathbf{g}_t'$
417
0 commit comments