Skip to content

Commit f06507c

Browse files
committed
redirect to duplicate
1 parent 18abd43 commit f06507c

File tree

1 file changed

+7
-316
lines changed

1 file changed

+7
-316
lines changed
Lines changed: 7 additions & 316 deletions
Original file line numberDiff line numberDiff line change
@@ -1,324 +1,15 @@
11
# -*- coding: utf-8 -*-
22
"""
3-
A Gentle Introduction to ``torch.autograd``
4-
===========================================
5-
6-
``torch.autograd`` is PyTorch’s automatic differentiation engine that powers
7-
neural network training. In this section, you will get a conceptual
8-
understanding of how autograd helps a neural network train.
9-
10-
Background
11-
~~~~~~~~~~
12-
Neural networks (NNs) are a collection of nested functions that are
13-
executed on some input data. These functions are defined by *parameters*
14-
(consisting of weights and biases), which in PyTorch are stored in
15-
tensors.
16-
17-
Training a NN happens in two steps:
18-
19-
**Forward Propagation**: In forward prop, the NN makes its best guess
20-
about the correct output. It runs the input data through each of its
21-
functions to make this guess.
22-
23-
**Backward Propagation**: In backprop, the NN adjusts its parameters
24-
proportionate to the error in its guess. It does this by traversing
25-
backwards from the output, collecting the derivatives of the error with
26-
respect to the parameters of the functions (*gradients*), and optimizing
27-
the parameters using gradient descent. For a more detailed walkthrough
28-
of backprop, check out this `video from
29-
3Blue1Brown <https://www.youtube.com/watch?v=tIeHLnjs5U8>`__.
30-
3+
:orphan:
314
5+
A Gentle Introduction to ``torch.autograd``
6+
==============================================
327
8+
This tutorial has been deprecated because there is an identical basics tutorial.
339
34-
Usage in PyTorch
35-
~~~~~~~~~~~~~~~~
36-
Let's take a look at a single training step.
37-
For this example, we load a pretrained resnet18 model from ``torchvision``.
38-
We create a random data tensor to represent a single image with 3 channels, and height & width of 64,
39-
and its corresponding ``label`` initialized to some random values. Label in pretrained models has
40-
shape (1,1000).
10+
Redirecting in 3 seconds...
4111
42-
.. note::
43-
This tutorial works only on the CPU and will not work on GPU devices (even if tensors are moved to CUDA).
12+
.. raw:: html
4413
14+
<meta http-equiv="Refresh" content="3; url='https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html'" />
4515
"""
46-
import torch
47-
from torchvision.models import resnet18, ResNet18_Weights
48-
model = resnet18(weights=ResNet18_Weights.DEFAULT)
49-
data = torch.rand(1, 3, 64, 64)
50-
labels = torch.rand(1, 1000)
51-
52-
############################################################
53-
# Next, we run the input data through the model through each of its layers to make a prediction.
54-
# This is the **forward pass**.
55-
#
56-
57-
prediction = model(data) # forward pass
58-
59-
############################################################
60-
# We use the model's prediction and the corresponding label to calculate the error (``loss``).
61-
# The next step is to backpropagate this error through the network.
62-
# Backward propagation is kicked off when we call ``.backward()`` on the error tensor.
63-
# Autograd then calculates and stores the gradients for each model parameter in the parameter's ``.grad`` attribute.
64-
#
65-
66-
loss = (prediction - labels).sum()
67-
loss.backward() # backward pass
68-
69-
############################################################
70-
# Next, we load an optimizer, in this case SGD with a learning rate of 0.01 and `momentum <https://towardsdatascience.com/stochastic-gradient-descent-with-momentum-a84097641a5d>`__ of 0.9.
71-
# We register all the parameters of the model in the optimizer.
72-
#
73-
74-
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
75-
76-
######################################################################
77-
# Finally, we call ``.step()`` to initiate gradient descent. The optimizer adjusts each parameter by its gradient stored in ``.grad``.
78-
#
79-
80-
optim.step() #gradient descent
81-
82-
######################################################################
83-
# At this point, you have everything you need to train your neural network.
84-
# The below sections detail the workings of autograd - feel free to skip them.
85-
#
86-
87-
88-
######################################################################
89-
# --------------
90-
#
91-
92-
93-
######################################################################
94-
# Differentiation in Autograd
95-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~
96-
# Let's take a look at how ``autograd`` collects gradients. We create two tensors ``a`` and ``b`` with
97-
# ``requires_grad=True``. This signals to ``autograd`` that every operation on them should be tracked.
98-
#
99-
100-
import torch
101-
102-
a = torch.tensor([2., 3.], requires_grad=True)
103-
b = torch.tensor([6., 4.], requires_grad=True)
104-
105-
######################################################################
106-
# We create another tensor ``Q`` from ``a`` and ``b``.
107-
#
108-
# .. math::
109-
# Q = 3a^3 - b^2
110-
111-
Q = 3*a**3 - b**2
112-
113-
114-
######################################################################
115-
# Let's assume ``a`` and ``b`` to be parameters of an NN, and ``Q``
116-
# to be the error. In NN training, we want gradients of the error
117-
# w.r.t. parameters, i.e.
118-
#
119-
# .. math::
120-
# \frac{\partial Q}{\partial a} = 9a^2
121-
#
122-
# .. math::
123-
# \frac{\partial Q}{\partial b} = -2b
124-
#
125-
#
126-
# When we call ``.backward()`` on ``Q``, autograd calculates these gradients
127-
# and stores them in the respective tensors' ``.grad`` attribute.
128-
#
129-
# We need to explicitly pass a ``gradient`` argument in ``Q.backward()`` because it is a vector.
130-
# ``gradient`` is a tensor of the same shape as ``Q``, and it represents the
131-
# gradient of Q w.r.t. itself, i.e.
132-
#
133-
# .. math::
134-
# \frac{dQ}{dQ} = 1
135-
#
136-
# Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like ``Q.sum().backward()``.
137-
#
138-
external_grad = torch.tensor([1., 1.])
139-
Q.backward(gradient=external_grad)
140-
141-
142-
#######################################################################
143-
# Gradients are now deposited in ``a.grad`` and ``b.grad``
144-
145-
# check if collected gradients are correct
146-
print(9*a**2 == a.grad)
147-
print(-2*b == b.grad)
148-
149-
150-
######################################################################
151-
# Optional Reading - Vector Calculus using ``autograd``
152-
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
153-
#
154-
# Mathematically, if you have a vector valued function
155-
# :math:`\vec{y}=f(\vec{x})`, then the gradient of :math:`\vec{y}` with
156-
# respect to :math:`\vec{x}` is a Jacobian matrix :math:`J`:
157-
#
158-
# .. math::
159-
#
160-
#
161-
# J
162-
# =
163-
# \left(\begin{array}{cc}
164-
# \frac{\partial \bf{y}}{\partial x_{1}} &
165-
# ... &
166-
# \frac{\partial \bf{y}}{\partial x_{n}}
167-
# \end{array}\right)
168-
# =
169-
# \left(\begin{array}{ccc}
170-
# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
171-
# \vdots & \ddots & \vdots\\
172-
# \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
173-
# \end{array}\right)
174-
#
175-
# Generally speaking, ``torch.autograd`` is an engine for computing
176-
# vector-Jacobian product. That is, given any vector :math:`\vec{v}`, compute the product
177-
# :math:`J^{T}\cdot \vec{v}`
178-
#
179-
# If :math:`\vec{v}` happens to be the gradient of a scalar function :math:`l=g\left(\vec{y}\right)`:
180-
#
181-
# .. math::
182-
#
183-
#
184-
# \vec{v}
185-
# =
186-
# \left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}
187-
#
188-
# then by the chain rule, the vector-Jacobian product would be the
189-
# gradient of :math:`l` with respect to :math:`\vec{x}`:
190-
#
191-
# .. math::
192-
#
193-
#
194-
# J^{T}\cdot \vec{v}=\left(\begin{array}{ccc}
195-
# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
196-
# \vdots & \ddots & \vdots\\
197-
# \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
198-
# \end{array}\right)\left(\begin{array}{c}
199-
# \frac{\partial l}{\partial y_{1}}\\
200-
# \vdots\\
201-
# \frac{\partial l}{\partial y_{m}}
202-
# \end{array}\right)=\left(\begin{array}{c}
203-
# \frac{\partial l}{\partial x_{1}}\\
204-
# \vdots\\
205-
# \frac{\partial l}{\partial x_{n}}
206-
# \end{array}\right)
207-
#
208-
# This characteristic of vector-Jacobian product is what we use in the above example;
209-
# ``external_grad`` represents :math:`\vec{v}`.
210-
#
211-
212-
213-
214-
######################################################################
215-
# Computational Graph
216-
# ~~~~~~~~~~~~~~~~~~~
217-
#
218-
# Conceptually, autograd keeps a record of data (tensors) & all executed
219-
# operations (along with the resulting new tensors) in a directed acyclic
220-
# graph (DAG) consisting of
221-
# `Function <https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function>`__
222-
# objects. In this DAG, leaves are the input tensors, roots are the output
223-
# tensors. By tracing this graph from roots to leaves, you can
224-
# automatically compute the gradients using the chain rule.
225-
#
226-
# In a forward pass, autograd does two things simultaneously:
227-
#
228-
# - run the requested operation to compute a resulting tensor, and
229-
# - maintain the operation’s *gradient function* in the DAG.
230-
#
231-
# The backward pass kicks off when ``.backward()`` is called on the DAG
232-
# root. ``autograd`` then:
233-
#
234-
# - computes the gradients from each ``.grad_fn``,
235-
# - accumulates them in the respective tensor’s ``.grad`` attribute, and
236-
# - using the chain rule, propagates all the way to the leaf tensors.
237-
#
238-
# Below is a visual representation of the DAG in our example. In the graph,
239-
# the arrows are in the direction of the forward pass. The nodes represent the backward functions
240-
# of each operation in the forward pass. The leaf nodes in blue represent our leaf tensors ``a`` and ``b``.
241-
#
242-
# .. figure:: /_static/img/dag_autograd.png
243-
#
244-
# .. note::
245-
# **DAGs are dynamic in PyTorch**
246-
# An important thing to note is that the graph is recreated from scratch; after each
247-
# ``.backward()`` call, autograd starts populating a new graph. This is
248-
# exactly what allows you to use control flow statements in your model;
249-
# you can change the shape, size and operations at every iteration if
250-
# needed.
251-
#
252-
# Exclusion from the DAG
253-
# ^^^^^^^^^^^^^^^^^^^^^^
254-
#
255-
# ``torch.autograd`` tracks operations on all tensors which have their
256-
# ``requires_grad`` flag set to ``True``. For tensors that don’t require
257-
# gradients, setting this attribute to ``False`` excludes it from the
258-
# gradient computation DAG.
259-
#
260-
# The output tensor of an operation will require gradients even if only a
261-
# single input tensor has ``requires_grad=True``.
262-
#
263-
264-
x = torch.rand(5, 5)
265-
y = torch.rand(5, 5)
266-
z = torch.rand((5, 5), requires_grad=True)
267-
268-
a = x + y
269-
print(f"Does `a` require gradients?: {a.requires_grad}")
270-
b = x + z
271-
print(f"Does `b` require gradients?: {b.requires_grad}")
272-
273-
274-
######################################################################
275-
# In a NN, parameters that don't compute gradients are usually called **frozen parameters**.
276-
# It is useful to "freeze" part of your model if you know in advance that you won't need the gradients of those parameters
277-
# (this offers some performance benefits by reducing autograd computations).
278-
#
279-
# In finetuning, we freeze most of the model and typically only modify the classifier layers to make predictions on new labels.
280-
# Let's walk through a small example to demonstrate this. As before, we load a pretrained resnet18 model, and freeze all the parameters.
281-
282-
from torch import nn, optim
283-
284-
model = resnet18(weights=ResNet18_Weights.DEFAULT)
285-
286-
# Freeze all the parameters in the network
287-
for param in model.parameters():
288-
param.requires_grad = False
289-
290-
######################################################################
291-
# Let's say we want to finetune the model on a new dataset with 10 labels.
292-
# In resnet, the classifier is the last linear layer ``model.fc``.
293-
# We can simply replace it with a new linear layer (unfrozen by default)
294-
# that acts as our classifier.
295-
296-
model.fc = nn.Linear(512, 10)
297-
298-
######################################################################
299-
# Now all parameters in the model, except the parameters of ``model.fc``, are frozen.
300-
# The only parameters that compute gradients are the weights and bias of ``model.fc``.
301-
302-
# Optimize only the classifier
303-
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
304-
305-
##########################################################################
306-
# Notice although we register all the parameters in the optimizer,
307-
# the only parameters that are computing gradients (and hence updated in gradient descent)
308-
# are the weights and bias of the classifier.
309-
#
310-
# The same exclusionary functionality is available as a context manager in
311-
# `torch.no_grad() <https://pytorch.org/docs/stable/generated/torch.no_grad.html>`__
312-
#
313-
314-
######################################################################
315-
# --------------
316-
#
317-
318-
######################################################################
319-
# Further readings:
320-
# ~~~~~~~~~~~~~~~~~~~
321-
#
322-
# - `In-place operations & Multithreaded Autograd <https://pytorch.org/docs/stable/notes/autograd.html>`__
323-
# - `Example implementation of reverse-mode autodiff <https://colab.research.google.com/drive/1VpeE6UvEPRz9HmsHh1KS0XxXjYu533EC>`__
324-
# - `Video: PyTorch Autograd Explained - In-depth Tutorial <https://www.youtube.com/watch?v=MswxJw-8PvE>`__

0 commit comments

Comments
 (0)