trying to overfit on a single image, but train/val losses and metrics are not the same #12178

pini-kop · 2022-03-02T10:38:27Z

pini-kop
Mar 2, 2022

Hi, perhaps I'm missing something..

I have a segmentation task and I'm trying to overfit on a single image but the train, validation and test losses (on the same image) are not equal to one each other (also the metrics I'm calculating). the model is able to overfit, just that the numbers are a bit different.

I built a dummy sampler that returns a constant index and giving it to the dataloader. and I'm not applying any transforms (except albuminations.ToTensorV2)
train_sampler = DummySampler(main_indices=[5])
train_dl = DataLoader(train_ds, 1, sampler=train_sampler, drop_last=True, num_workers=os.cpu_count(), pin_memory=True)

The same dataloder is used in the Trainer for both train and for validation,
trainer.fit(model, train_dl, train_dl)

the train_step, validation_step and test_step are identical inside the model.

I also tried to use overfit_batches flag but got the same thing - different numbers for train and validation.
am I missing something, in this setup shouldn't the losses be equal? and metrics scores?
Thanks

rohitgr7 · 2022-03-02T11:02:11Z

rohitgr7
Mar 2, 2022

depends. Do you have any batch_norm or dropouts in your model?
ideally, losses should be the same. Metric too if you are calculating them manually or if using torchmetrics, then evaluating it correctly.

you can also debug this by checking the batch and outputs of your model in each of these hooks. if outputs are different then there must be something that's different during train/eval mode.

5 replies

pini-kop Mar 2, 2022
Author

I overlooked it, thanks!

rohitgr7 Mar 2, 2022

was it due to dropout/batchnorm?
just confirming.

pini-kop Mar 2, 2022
Author

There were dropout and instance norm layers. but even when canceling them the diffs remain. still not sure why

rohitgr7 Mar 2, 2022

is the batch different in both the cases or outputs from the model?

pini-kop Mar 2, 2022
Author

it's the same single image, batchsize of 1 on a single GPU. every thing else is identical as much as I can see.
(It's not my model, I'm trying to debug it)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

trying to overfit on a single image, but train/val losses and metrics are not the same #12178

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

trying to overfit on a single image, but train/val losses and metrics are not the same #12178

Uh oh!

Uh oh!

pini-kop Mar 2, 2022

Replies: 1 comment · 5 replies

Uh oh!

rohitgr7 Mar 2, 2022

Uh oh!

pini-kop Mar 2, 2022 Author

Uh oh!

rohitgr7 Mar 2, 2022

Uh oh!

pini-kop Mar 2, 2022 Author

Uh oh!

rohitgr7 Mar 2, 2022

Uh oh!

pini-kop Mar 2, 2022 Author

pini-kop
Mar 2, 2022

Replies: 1 comment 5 replies

rohitgr7
Mar 2, 2022

pini-kop Mar 2, 2022
Author

pini-kop Mar 2, 2022
Author

pini-kop Mar 2, 2022
Author