A model is not getting trained #642

celdorwow · 2023-09-23T12:53:40Z

celdorwow
Sep 23, 2023

Code Based on the video: 01_pytorch_workflow.ipynb

My code

I tried to build a simple regression model, similar to one in the video. Here's the code:

# Imports
import torch
from torch import nn      ## nn contains all of PyTorch's building blocks
from torch import optim   ## optim contains optimisers used by PyTorch


# Create model class
class LinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_layer = nn.Linear(in_features=1, out_features=1)
        
    # Forward method to define the computation in the model
    def forward(self, X: torch.Tensor) -> torch.Tensor:
        return self.linear_layer(X)

    
# Set manual seed
torch.manual_seed(42)

# Create data
weight, bias = 0.7, 0.3
start, end, step = 0, 1, 0.02
X = torch.arange(start, end, step).unsqueeze(dim=1)
y = weight*X + bias + 0.035*torch.randn_like(X) 

# Create train/test split
train_split = int(0.8*len(X))
X_train, y_train = X[:train_split], y[:train_split]
X_test,  y_test  = X[train_split:], y[train_split:]

# Create model
model = LinearRegressionModel()
print(model, end="\n\n")
print(model.state_dict(), end="\n\n")

# Create loss function
loss_fn = nn.L1Loss()

# Create optimiser
optimiser = optim.SGD(params=model.parameters(), lr=1e2)

# Training loop 
epochs = 200
for epoch in range(1, epochs+1):
    # Set model to training mode
    model.train()
    
    # Forward pass
    y_pred = model(X_train)
    
    # Calculate loss
    loss = loss_fn(y_pred, y_train)
    
    # Zero gradients in optimiser
    optimiser.zero_grad()
    
    # Backpropate
    loss.backward()
    
    # Step model's lparameters
    optimiser.step()
    
    ### Evaluate the current state
    model.eval()
    with torch.inference_mode():
        test_pred = model(X_test)
        test_loss = loss_fn(test_pred, y_test)
    
    # Print the current state
    if epoch == 1 or epoch % 10 == 0:
        print("Epoch: {:3} | Loss: {:.2f} | Test loss: {:.2f}".format(epoch,loss,test_loss))

print();print(model.state_dict())

What I've tried so far

To my understanding, I feel I did everything according to what I learned in the video. I tried to build aregression model from scratch (OK with a little help from the video) and tried to train model's parameters.

For some reason the model is not getting trained and I get really odd loss and test_loss. I can't figure out what is wrong here. Here's the output

LinearRegressionModel(
  (linear_layer): Linear(in_features=1, out_features=1, bias=True)
)

OrderedDict([('linear_layer.weight', tensor([[0.8294]])), ('linear_layer.bias', tensor([-0.5927]))])

Epoch:   1 | Loss: 0.85 | Test loss: 133.93
Epoch:  10 | Loss: 114.36 | Test loss: 0.78
Epoch:  20 | Loss: 114.36 | Test loss: 0.78
Epoch:  30 | Loss: 114.36 | Test loss: 0.78
Epoch:  40 | Loss: 114.36 | Test loss: 0.78
Epoch:  50 | Loss: 114.36 | Test loss: 0.78
Epoch:  60 | Loss: 114.36 | Test loss: 0.78
Epoch:  70 | Loss: 114.36 | Test loss: 0.78
Epoch:  80 | Loss: 114.36 | Test loss: 0.78
Epoch:  90 | Loss: 114.36 | Test loss: 0.78
Epoch: 100 | Loss: 114.36 | Test loss: 0.78
Epoch: 110 | Loss: 114.36 | Test loss: 0.78
Epoch: 120 | Loss: 114.36 | Test loss: 0.78
Epoch: 130 | Loss: 114.36 | Test loss: 0.78
Epoch: 140 | Loss: 114.36 | Test loss: 0.78
Epoch: 150 | Loss: 114.36 | Test loss: 0.78
Epoch: 160 | Loss: 114.36 | Test loss: 0.78
Epoch: 170 | Loss: 114.36 | Test loss: 0.78
Epoch: 180 | Loss: 114.36 | Test loss: 0.78
Epoch: 190 | Loss: 114.36 | Test loss: 0.78
Epoch: 200 | Loss: 114.36 | Test loss: 0.78

OrderedDict([('linear_layer.weight', tensor([[0.8294]])), ('linear_layer.bias', tensor([-0.5927]))])

Answered by Klubenn

Sep 28, 2023

it seems to me you have a very high learning rate. Shouldn't it be 1e-2 instead of 1e2?

View full answer

Klubenn · 2023-09-28T16:17:19Z

Klubenn
Sep 28, 2023

it seems to me you have a very high learning rate. Shouldn't it be 1e-2 instead of 1e2?

2 replies

shisirkha Dec 18, 2023

NAH why the heck people will do learning rate that high it would be 0.01 or 0.1 as proffered.

celdorwow Jan 16, 2024
Author

@Klubenn Of course it was a typo. I meant to write 1e-2, not 1e2. Thanks.

shisirkha · 2023-12-18T12:59:24Z

shisirkha
Dec 18, 2023

learning rate should be 0.1 or 0.01

1 reply

celdorwow Jan 16, 2024
Author

Typo. loss was supposed to be 1e-2. Ty

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A model is not getting trained #642

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

A model is not getting trained #642

Uh oh!

celdorwow Sep 23, 2023

My code

What I've tried so far

Replies: 2 comments · 3 replies

Uh oh!

Klubenn Sep 28, 2023

Uh oh!

shisirkha Dec 18, 2023

Uh oh!

celdorwow Jan 16, 2024 Author

Uh oh!

shisirkha Dec 18, 2023

Uh oh!

celdorwow Jan 16, 2024 Author

celdorwow
Sep 23, 2023

Replies: 2 comments 3 replies

Klubenn
Sep 28, 2023

celdorwow Jan 16, 2024
Author

shisirkha
Dec 18, 2023

celdorwow Jan 16, 2024
Author