Skip to content

Version compitability of pytorch-lightning #282

@kaelsunkiller

Description

@kaelsunkiller

May I ask which version of pl did you use for developing this codebase?

I tried the newest 2.0 but got lots of bugs, params and functions deprecated, etc. So I degrade it to 1.5 now, with the compatible torch 1.8.0 and torchmetrics, but still find it stuck at step 1770/1850 epoch 0, very confusing.

I thought it might have gone through the validation step, because of a warning by pl as below:

/opt/conda/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:56: UserWarning: Trying to infer the 'batch_size' from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use 'self.log(..., batch_size=batch_size)'.

The batch size changed to 1, and also this warning is new in pl 1.5. I don't know if it causes any error in computation.

Back to the stuck issue, I waited for more than 30 mins which is much longer than the eta of training one epoch. Still stuck, no errors or warnings, desperate...

Too many uncertain issues with pl training. So I have to ask the version that can work with this codebase. Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions