@@ -8,12 +8,13 @@ This approach offers a trade-off between the precision of the aggregation decisi
88computational cost associated with computing the Gramian of the full Jacobian. For a complete,
99non-partial version, see the :doc: `IWRM <iwrm >` example.
1010
11- In this example, our model consists of three `Linear ` layers separated by `ReLU ` layers. We
12- perform the partial descent by considering only the parameters of the last two `Linear ` layers. By
11+ In this example, our model consists of three `` Linear `` layers separated by `` ReLU ` ` layers. We
12+ perform the partial descent by considering only the parameters of the last two `` Linear ` ` layers. By
1313doing this, we avoid computing the Jacobian and its Gramian with respect to the parameters of the
14- first `Linear ` layer, thereby reducing memory usage and computation time.
14+ first `` Linear ` ` layer, thereby reducing memory usage and computation time.
1515
1616.. code-block :: python
17+ :emphasize- lines: 16 - 18
1718
1819 import torch
1920 from torch.nn import Linear, MSELoss, ReLU, Sequential
@@ -30,8 +31,8 @@ first `Linear` layer, thereby reducing memory usage and computation time.
3031
3132 weighting = UPGradWeighting()
3233
33- # Create the autogram engine that will compute the Gramian of the Jacobian with respect to the
34- # two last Linear layers' parameters.
34+ # Create the autogram engine that will compute the Gramian of the
35+ # Jacobian with respect to the two last Linear layers' parameters.
3536 engine = Engine(model[2 :].modules())
3637
3738 params = model.parameters()
0 commit comments