We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 8e9cf6d commit 262746fCopy full SHA for 262746f
docs/source-pytorch/accelerators/gpu_faq.rst
@@ -55,7 +55,7 @@ In general, there are two common scaling rules:
55
56
1. **Linear scaling**: Increase the learning rate linearly with the number of devices.
57
58
- .. testcode::
+ .. code-block:: python
59
60
# Example: Linear scaling
61
base_lr = 1e-3
@@ -64,7 +64,7 @@ In general, there are two common scaling rules:
64
65
2. **Square root scaling**: Increase the learning rate by the square root of the number of devices.
66
67
68
69
# Example: Square root scaling
70
0 commit comments