How does test_dataloader and test_data targets come out same first time #581
-
At timestamp *19:22:06 Targets for test_dataloader and test_data are different while plotting confusion matrixI was doing the exercise notebook of the course when I started plotting the confusion_matrix I noticed how we used test_dataloader for prediction labels while we use test_data.targets in truth labels in confusion matrix function from torchmetrics but how will it even work out as test_dataloader has shuffled images inside it. Can anyone explain how it ran in the course video and even my notebook but when not in the practice notebook and it shouldn't by default, need someone to explain how did that happen? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @Grimshinigami , Good question! But this is one of the main reasons we don't shuffle the You can shuffle it if you like. But as you've seen, this can cause confusion/errors when evaluating (due to different instances of the test data being in different orders). In short, best practice is usually:
Code example: from torch.utils.data import DataLoader
# Setup the batch size hyperparameter
BATCH_SIZE = 32
# Turn datasets into iterables (batches)
train_dataloader = DataLoader(train_data, # dataset to turn into iterable
batch_size=BATCH_SIZE, # how many samples per batch?
shuffle=True # shuffle data every epoch?
)
test_dataloader = DataLoader(test_data,
batch_size=BATCH_SIZE,
shuffle=False # don't necessarily have to shuffle the testing data
)
# Let's check out what we've created
print(f"Dataloaders: {train_dataloader, test_dataloader}")
print(f"Length of train dataloader: {len(train_dataloader)} batches of {BATCH_SIZE}")
print(f"Length of test dataloader: {len(test_dataloader)} batches of {BATCH_SIZE}") Resource: https://www.learnpytorch.io/03_pytorch_computer_vision/#2-prepare-dataloader |
Beta Was this translation helpful? Give feedback.
Hi @Grimshinigami ,
Good question!
But this is one of the main reasons we don't shuffle the
test_dataloader
.You can shuffle it if you like.
But as you've seen, this can cause confusion/errors when evaluating (due to different instances of the test data being in different orders).
In short, best practice is usually:
Code example: