Training >60 times slower after converting to lightning. Am I doing something wrong? #7743

cosmicnet · 2021-05-27T18:45:21Z

cosmicnet
May 27, 2021

Lightning looks like it would be a good fit for my research. But upon converting a basic linear regression example from pytorch to lightning I'm seeing a dramatic reduction in performance.

I'm new to ML so I hope it's something that I'm doing wrong.

Here is the code example:

# Import required modules
import os
import random
import numpy as np
import sqlite3
import timeit
import time
import torch
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, random_split
import pytorch_lightning as pl
from sklearn.model_selection import train_test_split

# Define models
class LinearRegressionO(nn.Module):

    def __init__(self, inputSize, outputSize):
        super().__init__()
        self.linear = torch.nn.Linear(inputSize, outputSize)

    def forward(self, x):
        return self.linear(x)

class LinearRegressionL(pl.LightningModule):

    def __init__(self, inputSize, outputSize, lr=0.01):
        super().__init__()
        self.learning_rate = lr
        self.linear = torch.nn.Linear(inputSize, outputSize)
        self.criterion = torch.nn.MSELoss()

    def forward(self, x):
        return self.linear(x)

    def configure_optimizers(self):
        optimiser = torch.optim.Adam(self.parameters(), lr=learning_rate)
        return optimiser

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = self.criterion(y_hat, y)
        return loss

    def test_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = self.criterion(y_hat, y)
        self.log('test_loss', loss)
        return loss

class RegressionDataset(torch.utils.data.Dataset):
    def __init__(self, x_data, y_data):
        self.x = torch.from_numpy(x_data)
        self.y = torch.from_numpy(y_data)

    def __len__(self):
        return len(self.x)

    def __getitem__(self, index):
        return self.x[index], self.y[index]

def NormaliseDataset(x):
    mean = x.mean(0)
    std = x.std(0)
    x -= mean
    x /= std
    return x

# Configuration
test_size=0.2
learning_rate = 0.01
epochs = 100

## SQL to load out data removed

# Load features and targets from SQLite
conn = sqlite3.connect(file_db)
cur = conn.execute(sql_predictors)
x_values = cur.fetchall()

# Each col will be a target in a different regression
cur = conn.execute(sql_target)
target_values = cur.fetchall()

# Convert to ndarrays
x_full = np.array(x_values, dtype=np.float32) # shape (2209, 44)
target_full = np.array(target_values, dtype=np.float32) # shape (2209, 60) # However, only 1 descriptor is used in the training

# Get the shape
m, n = x_full.shape
#_, o = target_full.shape
o = 1

# Order the randomness
torch.manual_seed(0)
np.random.seed(0)
random.seed(0)

# Normalise the predictors
x_norm = NormaliseDataset(x_full)

# Split into training and test sets
x_train, x_test, y_train, y_test = train_test_split(x_norm, target_full.T[0].reshape(m,1), test_size=test_size)

# Put data into the dataset format pytorch wants
dataset_train = RegressionDataset(x_train, y_train)
train_loader = DataLoader(dataset_train, batch_size=m)

# init model
model = LinearRegressionL(n, o, lr=learning_rate)

start_time = timeit.default_timer()

trainer = pl.Trainer(max_epochs=epochs, progress_bar_refresh_rate=0)
trainer.fit(model, train_loader)

print('Time taken:', timeit.default_timer() - start_time) # prints "Time taken: 2.252009899999848"

dataset_test = RegressionDataset(x_test, y_test)
test_loader = DataLoader(dataset_test, batch_size=m)#, num_workers=8)

res = trainer.test(model, test_loader)
print( res ) # prints "[{'test_loss': 1203.740966796875}]"


## Now the original code

import torch.optim as optim
from torch.autograd import Variable

# Order the randomness
torch.manual_seed(0)
np.random.seed(0)
random.seed(0)

## Train
model = LinearRegressionO(n, o)
criterion = torch.nn.MSELoss()
optimiser = optim.Adam(model.parameters(), lr=learning_rate)

inputs = Variable( torch.from_numpy(x_train) )
labels = Variable( torch.from_numpy(y_train) )
test_inputs = Variable( torch.from_numpy(x_test) )
test_labels = Variable( torch.from_numpy(y_test) )

start_time = timeit.default_timer()

for epoch in range(epochs):
#    for batch_index, (inputs, labels) in enumerate(train_loader):
        # Clear gradient buffers because we don't want any gradient from previous epoch to carry forward, don't want to accummulate gradients
        optimiser.zero_grad()

        # Get output from the model, given the inputs
        model.train()
        torch.set_grad_enabled(True)
        train_predicted = model(inputs)
        # Get loss for the predicted output
        train_loss = criterion(labels, train_predicted)
        # Get gradients w.r.t to parameters
        train_loss.backward()
        # Update parameters
        optimiser.step()

print('Time taken:', timeit.default_timer() - start_time) # prints "Time taken: 0.04631330000120215"

# Compare to test set
model.eval()
torch.set_grad_enabled(False)
test_predicted = model(test_inputs)
test_loss = criterion(test_labels, test_predicted)
print(test_loss) # prints "tensor(1203.7410)"

I though that a lot of the slowdown could come from DataLoader, which it does. Un-commenting the line:
# for batch_index, (inputs, labels) in enumerate(train_loader):
Changes to the time taken to 1.66, >30 times slower than the original.

I'm aware that I can add more workers, but that's not the difference in performance from the original. Maybe I have something wrong with the original or the way I've converted to lightning? Maybe the small dataset I'm working with is a factor?

Any guidance would be appreciated.

justusschock · 2021-05-28T07:31:49Z

justusschock
May 28, 2021
Maintainer

Hi,

Your timing probably isn't correct. You need to take care of synchronising the GPU before each timing since otherwise python code reaches the timing state where the GPU hasn't even executed but just queued the operation. We benchmark the results ourselves in our CI and we don't observe such a slowdown.

There are a few things you could improve (like the number of workers in your loader or not reinstantiating the loss all the time) but they are only minor and shouldn't impact the performance that drastically.

Have a look at https://discuss.pytorch.org/t/how-to-measure-time-in-pytorch/26964/2 for how to time correctly :)

Also:
You are simply measuring two separate things:
Your manual code just computes one forward per epoch (i.e. only does one optimiser step) whereas lightning per default produces batches from your data loader to leverage the best training experience. This would be similar to you commenting in the line for batch_index, (input, labels) in enumerate(train_loader):... These kinds of trainings are fundamentally different in there time consumption just because the batched approach computes many forward and backward passes to train the model more than you do with just one forward and backward pass per epoch

1 reply

cosmicnet May 28, 2021
Author

I'm afraid the timing isn't the issue. This is a cut down version to demonstrate the problem. The original code had the epochs as 4000, with the original finishing in a few seconds and the new one taking minutes.

This is running on CPU and not GPU.

I've checked the memory address's of the data coming from the DataLoader and can see that the memory is copied each time (or at least, it's creating a new data structure out of the original data. Which would make sense given then getitem implementation). This would explain a lot of the slow down, as the original code was just using the same memory addresses for the features for each epoch. I've set the batch size to match the full dataset, so it's only one batch per epoch. You are correct that before doing this the batch size was 1 (default) and each epoch took several seconds as it was calculating the forward and backward passes for row of features. With that configuration it was several hundred times slower.

I'm getting the impression that the original code was unusually fast and that was largely due to the small dataset and thus no need for batching.

To verify both versions produce the same result I set the random seed before each and sure enough both the lightning version and the original end up with the same coefficients and loss. I've updated the code so that the MSELoss object is only created once, and the training doesn't output. This has got the time closer to 2s, but it's still much slower.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training >60 times slower after converting to lightning. Am I doing something wrong? #7743

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Training >60 times slower after converting to lightning. Am I doing something wrong? #7743

Uh oh!

Uh oh!

cosmicnet May 27, 2021

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

justusschock May 28, 2021 Maintainer

Uh oh!

cosmicnet May 28, 2021 Author

cosmicnet
May 27, 2021

Replies: 1 comment 1 reply

justusschock
May 28, 2021
Maintainer

cosmicnet May 28, 2021
Author