How does test_dataloader and test_data targets come out same first time #581

Grimshinigami · 2023-08-01T13:45:01Z

Grimshinigami
Aug 1, 2023

At timestamp *19:22:06

Targets for test_dataloader and test_data are different while plotting confusion matrix

I was doing the exercise notebook of the course when I started plotting the confusion_matrix I noticed how we used test_dataloader for prediction labels while we use test_data.targets in truth labels in confusion matrix function from torchmetrics but how will it even work out as test_dataloader has shuffled images inside it.
My suspicion was correct and when I plotted the confusion matrix it came out all wrong so I used test_data insted of test_dataloader to get pred_labels and then the confusion matrix came out fine.

Can anyone explain how it ran in the course video and even my notebook but when not in the practice notebook and it shouldn't by default, need someone to explain how did that happen?

Answered by mrdbourke

Aug 4, 2023

Hi @Grimshinigami ,

Good question!

But this is one of the main reasons we don't shuffle the test_dataloader.

You can shuffle it if you like.

But as you've seen, this can cause confusion/errors when evaluating (due to different instances of the test data being in different orders).

In short, best practice is usually:

Shuffle training data (to prevent the model learning order during training)
Don't shuffle testing data

Code example:

from torch.utils.data import DataLoader

# Setup the batch size hyperparameter
BATCH_SIZE = 32

# Turn datasets into iterables (batches)
train_dataloader = DataLoader(train_data, # dataset to turn into iterable
    batch_size=BATCH_SIZE, # how many samples per…

View full answer

mrdbourke · 2023-08-04T00:17:45Z

mrdbourke
Aug 4, 2023
Maintainer

Hi @Grimshinigami ,

Good question!

But this is one of the main reasons we don't shuffle the test_dataloader.

You can shuffle it if you like.

But as you've seen, this can cause confusion/errors when evaluating (due to different instances of the test data being in different orders).

In short, best practice is usually:

Shuffle training data (to prevent the model learning order during training)
Don't shuffle testing data

Code example:

from torch.utils.data import DataLoader

# Setup the batch size hyperparameter
BATCH_SIZE = 32

# Turn datasets into iterables (batches)
train_dataloader = DataLoader(train_data, # dataset to turn into iterable
    batch_size=BATCH_SIZE, # how many samples per batch? 
    shuffle=True # shuffle data every epoch?
)

test_dataloader = DataLoader(test_data,
    batch_size=BATCH_SIZE,
    shuffle=False # don't necessarily have to shuffle the testing data
)

# Let's check out what we've created
print(f"Dataloaders: {train_dataloader, test_dataloader}") 
print(f"Length of train dataloader: {len(train_dataloader)} batches of {BATCH_SIZE}")
print(f"Length of test dataloader: {len(test_dataloader)} batches of {BATCH_SIZE}")

Resource: https://www.learnpytorch.io/03_pytorch_computer_vision/#2-prepare-dataloader

1 reply

Grimshinigami Aug 4, 2023
Author

Hello there and thanks for the reply I guess it slipped my mind cause when I looked in the notebook, yes the test_dataloader wasn't shuffled but I shuffled it in the exercise notebook. Thanks for resolving my query and for this amazing course as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does test_dataloader and test_data targets come out same first time #581

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How does test_dataloader and test_data targets come out same first time #581

Uh oh!

Grimshinigami Aug 1, 2023

Targets for test_dataloader and test_data are different while plotting confusion matrix

Replies: 1 comment · 1 reply

Uh oh!

mrdbourke Aug 4, 2023 Maintainer

Uh oh!

Grimshinigami Aug 4, 2023 Author

Grimshinigami
Aug 1, 2023

Replies: 1 comment 1 reply

mrdbourke
Aug 4, 2023
Maintainer

Grimshinigami Aug 4, 2023
Author