@@ -151,8 +151,12 @@ class ScalableShampoo(Optimizer, BaseOptimizer):
151151 :param inverse_exponent_override: int. fixed exponent for pre-conditioner, if > 0.
152152 :param start_preconditioning_step: int.
153153 :param preconditioning_compute_steps: int. performance tuning params for controlling memory and compute
154- requirements. How often to compute pre-conditioner.
155- :param statistics_compute_steps: int. How often to compute statistics.
154+ requirements. How often to compute pre-conditioner. Ideally, 1 is the best. However, the current implementation
155+ doesn't work on the distributed environment (there are no statistics & pre-conditioners sync among replicas),
156+ compute on the GPU (not CPU) and the precision is fp32 (not fp64).
157+ Also, followed by the paper, `preconditioning_compute_steps` does not have a significant effect on the
158+ performance. So, If you have a problem with the speed, try to set this step bigger (e.g. 1000).
159+ :param statistics_compute_steps: int. How often to compute statistics. usually set to 1 (or 10).
156160 :param block_size: int. Block size for large layers (if > 0).
157161 Block size = 1 ==> Adagrad (Don't do this, extremely inefficient!)
158162 Block size should be as large as feasible under memory/time constraints.
@@ -166,8 +170,8 @@ class ScalableShampoo(Optimizer, BaseOptimizer):
166170 :param diagonal_eps: float. term added to the denominator to improve numerical stability.
167171 :param matrix_eps: float. term added to the denominator to improve numerical stability.
168172 :param use_svd: bool. use SVD instead of Schur-Newton method to calculate M^{-1/p}.
169- Theoretically, Schur-Newton method is faster than SVD method to calculate M^{-1/p}.
170- However, the inefficiency of the loop code , SVD is much faster than that .
173+ Theoretically, Schur-Newton method is faster than SVD method. However, the inefficiency of the loop code and
174+ proper svd kernel , SVD is much faster in some cases (usually in case of small models) .
171175 see https://github.com/kozistr/pytorch_optimizer/pull/103
172176 """
173177
0 commit comments