PyTorch Workflow: How does the loss function know what is going on with the model parameters? #714

KevinGregory · 2023-11-08T15:55:36Z

KevinGregory
Nov 8, 2023

I'm trying to understand the training loop. Everything makes sense to me except for the loss function. I understand that we calculate the loss in a given epoch with loss = loss_fn(y_pred, y_train). I'm confused about two things:

how does loss.backward() know what the model parameters are? We don't connect it to the model. We define loss_fn = nn.L1Loss() and then we put in two tensors into the loss function with loss = loss_fn(y_pred, y_train). Does it know the parameters from the y_pred, y_train? I'm used to scikit learn where these are just vectors, but maybe the tensors have more information attached to them?
How does the optimizer know what the loss is? We never pass the loss function into the optimizer.

jagan-lakshmipathy · 2023-12-01T01:03:34Z

jagan-lakshmipathy
Dec 1, 2023

There is no easy answer to your question. Torch builds the computational graph for each computation you perform in the forward pass of the model. Loss function takes in the output variable from the forward pass that references the computational graph. Torch moves backwards (loss.backwards()) from the loss to each gradient in the computational graph using chain rule. Optimizer advances the weights with the gradients at the optimizer.step().

0 replies

shisirkha · 2023-12-16T13:10:02Z

shisirkha
Dec 16, 2023

The loss.backward() function in PyTorch uses the computational graph that is created when the forward pass is executed to compute the gradients of the loss with respect to the model parameters. The computational graph is a directed acyclic graph that represents the sequence of operations that are performed during the forward pass. Each node in the graph represents a tensor, and each edge represents a function that transforms the input tensor into the output tensor.

When you call loss.backward(), PyTorch uses the chain rule of calculus to compute the gradients of the loss with respect to each tensor in the graph. The gradients are then stored in the .grad attribute of each tensor. When you call optimizer.step(), the optimizer uses these gradients to update the model parameters.

The optimizer does not need to know the loss function explicitly. Instead, it uses the gradients that are computed by the loss.backward() function to update the model parameters. The loss function is only used to compute the loss value, which is then used to compute the gradients.

I hope this explanation helps

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorch Workflow: How does the loss function know what is going on with the model parameters? #714

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PyTorch Workflow: How does the loss function know what is going on with the model parameters? #714

Uh oh!

KevinGregory Nov 8, 2023

Replies: 2 comments

Uh oh!

jagan-lakshmipathy Dec 1, 2023

Uh oh!

shisirkha Dec 16, 2023

KevinGregory
Nov 8, 2023

jagan-lakshmipathy
Dec 1, 2023

shisirkha
Dec 16, 2023