Skip to content

Everything prints fine, but the loss doesn't descent #20344

@2catycm

Description

@2catycm

Bug description

Even after I set the learning rate to 1 and even 100,
the loss doesn't change at all, it is always 4.60.
I tried to debug into what happens, but it seems everything works fine, the loss is backwarded successfully, the grads of each parameters looks well, the optimizer is indeed called

What version are you seeing the problem on?

v2.3

How to reproduce the bug

class ClassificationTask(L.LightningModule):
    def __init__(self, config: ClassificationTaskConfig)->None:
        super().__init__()
        self.save_hyperparameters(config.model_dump())
        L.seed_everything(config.experiment_index) # use index as the seed for reproducibility
        self.lit_data:ClassificationDataModule = config.dataset_config.get_lightning_data_module()
        config.cls_model_config.num_of_classes = self.lit_data.num_of_classes
        self.cls_model:HuggingfaceModel = config.cls_model_config.get_cls_model()
        self.lit_data.set_transform_from_hf_image_preprocessor(hf_image_preprocessor=self.cls_model.image_preprocessor)
        
        model_image_size:tuple[int, int] = (self.cls_model.image_preprocessor.size['height'], self.cls_model.image_preprocessor.size['width'])
        self.example_input_array = torch.Tensor(1, self.cls_model.backbone.config.num_channels, *model_image_size)
        
        self.softmax = nn.Softmax(dim=1)    
        self.loss = nn.CrossEntropyLoss(label_smoothing=config.label_smoothing)
        
        self.automatic_optimization = False # The problem occurs when True, so I tried to use False to see what happens
    
    def compute_model_logits(self, image_tensor:torch.Tensor)-> torch.Tensor:
        return self.cls_model(image_tensor)
    
    @override
    def forward(self, image_tensor:torch.Tensor, *args, **kwargs)-> torch.Tensor:
        return self.softmax(self.compute_model_logits(image_tensor))

    def forward_loss(self, image_tensor: torch.Tensor, label_tensor:torch.Tensor)->torch.Tensor:
        probs = self(image_tensor)
        # return F.nll_loss(logits, label_tensor)
        return self.loss(probs, label_tensor)
    
    @override
    def training_step(self, batch, batch_idx=None, *args, **kwargs)-> STEP_OUTPUT:
        self.train()
        opt = self.optimizers()
        opt.zero_grad()
        
        loss = self.forward_loss(*batch)
        self.log("train_loss", loss, prog_bar=True)
        # self.manual_backward(loss)
        loss.backward()
        opt.step()
        return loss

    @override    
    def configure_optimizers(self) -> OptimizerLRScheduler:
        return torch.optim.AdamW(self.parameters(), lr=self.hparams.learning_rate)
from .core import ClassificationTask, ClassificationTaskConfig
config = ClassificationTaskConfig()
config.learning_rate = 3e-4 # doesn't work
config.learning_rate = 1000 # should expect a NaN if it is optimizing, try to debug
config.dataset_config.batch_size = 64
cls_task = ClassificationTask(config)

import lightning as L
from .utils import runs_path
from lightning.pytorch.callbacks.early_stopping import EarlyStopping
from lightning.pytorch.callbacks import ModelSummary, StochasticWeightAveraging, DeviceStatsMonitor
from lightning.pytorch.loggers import TensorBoardLogger, CSVLogger
trainer = L.Trainer(default_root_dir=runs_path, enable_checkpointing=True, 
                    enable_model_summary=True, 
                    num_sanity_val_steps=2, 
                    callbacks=[
                        EarlyStopping(monitor="val_acc1", mode="max", check_finite=True, 
                                      patience=5, 
                                      check_on_train_epoch_end=False,  # check on validation end
                                      verbose=True),
                        ModelSummary(max_depth=3),
                        DeviceStatsMonitor(cpu_stats=True)
                               ]
                    
                    , logger=[TensorBoardLogger(save_dir=runs_path/"tensorboard"), CSVLogger(save_dir=runs_path)]
                    )
trainer.fit(cls_task, datamodule=cls_task.lit_data)

Error messages and logs

root
└── cls_model (HuggingfaceModel)
    ├── backbone (ViTModel)
    │   ├── embeddings (ViTEmbeddings) cls_token:[1, 1, 768] position_embeddings:[1, 197, 768]
    │   │   └── patch_embeddings (ViTPatchEmbeddings)
    │   │       └── projection (Conv2d) weight:[768, 3, 16, 16] bias:[768]
    │   ├── encoder (ViTEncoder)
    │   │   └── layer (ModuleList)
    │   │       └── 0-11(ViTLayer)
    │   │           ├── attention (ViTAttention)
    │   │           │   ├── attention (ViTSelfAttention)
    │   │           │   │   └── query,key,value(Linear) weight:[768, 768] bias:[768]
    │   │           │   └── output (ViTSelfOutput)
    │   │           │       └── dense (Linear) weight:[768, 768] bias:[768]
    │   │           ├── intermediate (ViTIntermediate)
    │   │           │   └── dense (Linear) weight:[3072, 768] bias:[3072]
    │   │           ├── output (ViTOutput)
    │   │           │   └── dense (Linear) weight:[768, 3072] bias:[768]
    │   │           └── layernorm_before,layernorm_after(LayerNorm) weight:[768] bias:[768]
    │   ├── layernorm (LayerNorm) weight:[768] bias:[768]
    │   └── pooler (ViTPooler)
    │       └── dense (Linear) weight:[768, 768] bias:[768]
    └── head (Linear) weight:[100, 768] bias:[100]
Files already downloaded and verified
Files already downloaded and verified
202

Sanity Checking: |          | 0/? [00:00<?, ?it/s]
Sanity Checking:   0%|          | 0/2 [00:00<?, ?it/s]
Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]
Sanity Checking DataLoader 0:  50%|█████     | 1/2 [00:00<00:00,  1.78it/s]
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00,  2.78it/s]
                                                                           

Training: |          | 0/? [00:00<?, ?it/s]
Training:   0%|          | 0/704 [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/704 [00:00<?, ?it/s] 
Epoch 0:   0%|          | 1/704 [00:02<30:06,  0.39it/s]
Epoch 0:   0%|          | 1/704 [00:02<30:07,  0.39it/s, v_num=11, train_loss=4.610]
Epoch 0:   0%|          | 2/704 [00:03<17:54,  0.65it/s, v_num=11, train_loss=4.610]
Epoch 0:   0%|          | 2/704 [00:03<17:55,  0.65it/s, v_num=11, train_loss=4.610]
Epoch 0:   0%|          | 3/704 [00:03<13:59,  0.84it/s, v_num=11, train_loss=4.610]
Epoch 0:   0%|          | 3/704 [00:03<14:01,  0.83it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 4/704 [00:03<11:26,  1.02it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 4/704 [00:04<11:49,  0.99it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|          | 5/704 [00:04<09:31,  1.22it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|          | 5/704 [00:04<10:25,  1.12it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 6/704 [00:04<08:46,  1.33it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 6/704 [00:04<09:30,  1.22it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 7/704 [00:04<08:11,  1.42it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|          | 7/704 [00:05<08:50,  1.31it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|          | 8/704 [00:05<07:52,  1.47it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|          | 8/704 [00:05<08:22,  1.39it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|▏         | 9/704 [00:05<07:35,  1.53it/s, v_num=11, train_loss=4.600]
Epoch 0:   1%|▏         | 9/704 [00:06<07:58,  1.45it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|▏         | 10/704 [00:06<07:18,  1.58it/s, v_num=11, train_loss=4.610]
Epoch 0:   1%|▏         | 10/704 [00:06<07:39,  1.51it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 11/704 [00:06<06:59,  1.65it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 11/704 [00:07<07:23,  1.56it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 12/704 [00:07<06:48,  1.70it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 12/704 [00:07<07:10,  1.61it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 13/704 [00:07<06:39,  1.73it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 13/704 [00:07<06:59,  1.65it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 14/704 [00:07<06:30,  1.77it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 14/704 [00:08<06:49,  1.68it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 15/704 [00:08<06:23,  1.80it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 15/704 [00:08<06:41,  1.72it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 16/704 [00:08<06:16,  1.83it/s, v_num=11, train_loss=4.600]
Epoch 0:   2%|▏         | 16/704 [00:09<06:33,  1.75it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 17/704 [00:09<06:11,  1.85it/s, v_num=11, train_loss=4.610]
Epoch 0:   2%|▏         | 17/704 [00:09<06:27,  1.77it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 18/704 [00:09<06:06,  1.87it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 18/704 [00:10<06:21,  1.80it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 19/704 [00:10<06:02,  1.89it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 19/704 [00:10<06:15,  1.82it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 20/704 [00:10<05:57,  1.91it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 20/704 [00:10<06:10,  1.84it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 21/704 [00:10<05:53,  1.93it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 21/704 [00:11<06:06,  1.86it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 22/704 [00:11<05:50,  1.95it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 22/704 [00:11<06:02,  1.88it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 23/704 [00:11<05:48,  1.95it/s, v_num=11, train_loss=4.610]
Epoch 0:   3%|▎         | 23/704 [00:12<05:58,  1.90it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 24/704 [00:12<05:44,  1.97it/s, v_num=11, train_loss=4.600]
Epoch 0:   3%|▎         | 24/704 [00:12<05:55,  1.91it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▎         | 25/704 [00:12<05:41,  1.99it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▎         | 25/704 [00:12<05:52,  1.93it/s, v_num=11, train_loss=4.610]
Epoch 0:   4%|▎         | 26/704 [00:13<05:39,  2.00it/s, v_num=11, train_loss=4.610]
Epoch 0:   4%|▎         | 26/704 [00:13<05:49,  1.94it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 27/704 [00:13<05:36,  2.01it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 27/704 [00:13<05:46,  1.95it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 28/704 [00:13<05:34,  2.02it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 28/704 [00:14<05:43,  1.97it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 29/704 [00:14<05:32,  2.03it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 29/704 [00:14<05:41,  1.98it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 30/704 [00:14<05:30,  2.04it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 30/704 [00:15<05:39,  1.99it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 31/704 [00:15<05:30,  2.03it/s, v_num=11, train_loss=4.600]
Epoch 0:   4%|▍         | 31/704 [00:15<05:36,  2.00it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 32/704 [00:15<05:27,  2.05it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 32/704 [00:15<05:34,  2.01it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 33/704 [00:15<05:24,  2.07it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 33/704 [00:16<05:32,  2.02it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 34/704 [00:16<05:23,  2.07it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▍         | 34/704 [00:16<05:30,  2.03it/s, v_num=11, train_loss=4.610]
Epoch 0:   5%|▍         | 35/704 [00:16<05:21,  2.08it/s, v_num=11, train_loss=4.610]
Epoch 0:   5%|▍         | 35/704 [00:17<05:29,  2.03it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 36/704 [00:17<05:20,  2.09it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 36/704 [00:17<05:27,  2.04it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 37/704 [00:17<05:18,  2.09it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 37/704 [00:18<05:25,  2.05it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 38/704 [00:18<05:17,  2.10it/s, v_num=11, train_loss=4.600]
Epoch 0:   5%|▌         | 38/704 [00:18<05:24,  2.06it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 39/704 [00:18<05:15,  2.11it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 39/704 [00:18<05:22,  2.06it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 40/704 [00:18<05:15,  2.11it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 40/704 [00:19<05:21,  2.07it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 41/704 [00:19<05:13,  2.12it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 41/704 [00:19<05:19,  2.07it/s, v_num=11, train_loss=4.610]
Epoch 0:   6%|▌         | 42/704 [00:19<05:12,  2.12it/s, v_num=11, train_loss=4.610]
Epoch 0:   6%|▌         | 42/704 [00:20<05:18,  2.08it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 43/704 [00:20<05:10,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▌         | 43/704 [00:20<05:16,  2.09it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▋         | 44/704 [00:20<05:09,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▋         | 44/704 [00:21<05:15,  2.09it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▋         | 45/704 [00:21<05:09,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   6%|▋         | 45/704 [00:21<05:14,  2.10it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 46/704 [00:21<05:07,  2.14it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 46/704 [00:21<05:13,  2.10it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 47/704 [00:21<05:06,  2.14it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 47/704 [00:22<05:12,  2.11it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 48/704 [00:22<05:05,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 48/704 [00:22<05:10,  2.11it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 49/704 [00:22<05:05,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 49/704 [00:23<05:09,  2.11it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 50/704 [00:23<05:04,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 50/704 [00:23<05:08,  2.12it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 51/704 [00:23<05:03,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 51/704 [00:24<05:08,  2.12it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 52/704 [00:24<05:02,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   7%|▋         | 52/704 [00:24<05:07,  2.12it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 53/704 [00:24<05:01,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 53/704 [00:24<05:06,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 54/704 [00:24<05:00,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 54/704 [00:25<05:05,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 55/704 [00:25<04:59,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 55/704 [00:25<05:04,  2.13it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 56/704 [00:25<04:58,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 56/704 [00:26<05:03,  2.14it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 57/704 [00:26<04:57,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 57/704 [00:26<05:02,  2.14it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 58/704 [00:26<04:57,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 58/704 [00:27<05:01,  2.14it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 59/704 [00:27<04:56,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:   8%|▊         | 59/704 [00:27<05:00,  2.15it/s, v_num=11, train_loss=4.610]
Epoch 0:   9%|▊         | 60/704 [00:27<04:55,  2.18it/s, v_num=11, train_loss=4.610]
Epoch 0:   9%|▊         | 60/704 [00:27<04:59,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▊         | 61/704 [00:27<04:54,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▊         | 61/704 [00:28<04:58,  2.15it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 62/704 [00:28<04:53,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 62/704 [00:28<04:57,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 63/704 [00:28<04:53,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 63/704 [00:29<04:56,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 64/704 [00:29<04:52,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 64/704 [00:29<04:56,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 65/704 [00:29<04:51,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 65/704 [00:30<04:55,  2.16it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 66/704 [00:30<04:50,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:   9%|▉         | 66/704 [00:30<04:54,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 67/704 [00:30<04:50,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 67/704 [00:30<04:53,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 68/704 [00:30<04:49,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 68/704 [00:31<04:52,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 69/704 [00:31<04:48,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 69/704 [00:31<04:52,  2.17it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 70/704 [00:31<04:47,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|▉         | 70/704 [00:32<04:51,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 71/704 [00:32<04:47,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 71/704 [00:32<04:50,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 72/704 [00:32<04:46,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 72/704 [00:33<04:49,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 73/704 [00:33<04:45,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  10%|█         | 73/704 [00:33<04:49,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 74/704 [00:33<04:44,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 74/704 [00:33<04:48,  2.18it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 75/704 [00:33<04:44,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 75/704 [00:34<04:47,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 76/704 [00:34<04:44,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 76/704 [00:34<04:46,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 77/704 [00:34<04:43,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 77/704 [00:35<04:46,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 78/704 [00:35<04:42,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 78/704 [00:35<04:45,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 79/704 [00:35<04:42,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█         | 79/704 [00:36<04:44,  2.19it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█▏        | 80/704 [00:36<04:41,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  11%|█▏        | 80/704 [00:36<04:44,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 81/704 [00:36<04:40,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 81/704 [00:36<04:43,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 82/704 [00:36<04:39,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 82/704 [00:37<04:42,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 83/704 [00:37<04:39,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 83/704 [00:37<04:42,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 84/704 [00:37<04:38,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 84/704 [00:38<04:41,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 85/704 [00:38<04:38,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 85/704 [00:38<04:40,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 86/704 [00:38<04:37,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 86/704 [00:39<04:40,  2.20it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 87/704 [00:39<04:36,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▏        | 87/704 [00:39<04:39,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▎        | 88/704 [00:39<04:36,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  12%|█▎        | 88/704 [00:39<04:39,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 89/704 [00:39<04:36,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 89/704 [00:40<04:38,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 90/704 [00:40<04:35,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 90/704 [00:40<04:37,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 91/704 [00:40<04:34,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 91/704 [00:41<04:37,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 92/704 [00:41<04:34,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 92/704 [00:41<04:36,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 93/704 [00:41<04:33,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 93/704 [00:42<04:35,  2.21it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 94/704 [00:42<04:32,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 94/704 [00:42<04:35,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 95/704 [00:42<04:32,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  13%|█▎        | 95/704 [00:42<04:34,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▎        | 96/704 [00:42<04:31,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▎        | 96/704 [00:43<04:34,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 97/704 [00:43<04:31,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 97/704 [00:43<04:33,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 98/704 [00:43<04:30,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 98/704 [00:44<04:32,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 99/704 [00:44<04:30,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 99/704 [00:44<04:32,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 100/704 [00:44<04:29,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 100/704 [00:45<04:32,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 101/704 [00:45<04:29,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 101/704 [00:45<04:31,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 102/704 [00:45<04:28,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  14%|█▍        | 102/704 [00:45<04:30,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 103/704 [00:45<04:28,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 103/704 [00:46<04:30,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 104/704 [00:46<04:27,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 104/704 [00:46<04:29,  2.22it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 105/704 [00:46<04:26,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▍        | 105/704 [00:47<04:29,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 106/704 [00:47<04:26,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 106/704 [00:47<04:28,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 107/704 [00:47<04:25,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 107/704 [00:48<04:28,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 108/704 [00:48<04:25,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 108/704 [00:48<04:27,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 109/704 [00:48<04:24,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  15%|█▌        | 109/704 [00:48<04:26,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 110/704 [00:48<04:24,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 110/704 [00:49<04:26,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 111/704 [00:49<04:23,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 111/704 [00:49<04:25,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 112/704 [00:49<04:23,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 112/704 [00:50<04:25,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 113/704 [00:50<04:22,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 113/704 [00:50<04:24,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 114/704 [00:50<04:22,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▌        | 114/704 [00:51<04:24,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▋        | 115/704 [00:51<04:21,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▋        | 115/704 [00:51<04:23,  2.23it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▋        | 116/704 [00:51<04:21,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  16%|█▋        | 116/704 [00:51<04:23,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 117/704 [00:51<04:20,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 117/704 [00:52<04:22,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 118/704 [00:52<04:20,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 118/704 [00:52<04:21,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 119/704 [00:52<04:19,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 119/704 [00:53<04:21,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 120/704 [00:53<04:19,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 120/704 [00:53<04:20,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 121/704 [00:53<04:18,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 121/704 [00:54<04:20,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 122/704 [00:54<04:18,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 122/704 [00:54<04:19,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 123/704 [00:54<04:17,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  17%|█▋        | 123/704 [00:54<04:19,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 124/704 [00:54<04:17,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 124/704 [00:55<04:18,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 125/704 [00:55<04:16,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 125/704 [00:55<04:18,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 126/704 [00:55<04:15,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 126/704 [00:56<04:17,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 127/704 [00:56<04:15,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 127/704 [00:56<04:17,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 128/704 [00:56<04:15,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 128/704 [00:57<04:16,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 129/704 [00:57<04:14,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 129/704 [00:57<04:16,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 130/704 [00:57<04:13,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  18%|█▊        | 130/704 [00:57<04:15,  2.24it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▊        | 131/704 [00:57<04:13,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▊        | 131/704 [00:58<04:15,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 132/704 [00:58<04:12,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 132/704 [00:58<04:14,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 133/704 [00:58<04:12,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 133/704 [00:59<04:14,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 134/704 [00:59<04:12,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 134/704 [00:59<04:13,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 135/704 [00:59<04:11,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 135/704 [01:00<04:13,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 136/704 [01:00<04:11,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 136/704 [01:00<04:12,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 137/704 [01:00<04:10,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  19%|█▉        | 137/704 [01:00<04:12,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 138/704 [01:01<04:10,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 138/704 [01:01<04:11,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 139/704 [01:01<04:09,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 139/704 [01:01<04:11,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 140/704 [01:01<04:09,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|█▉        | 140/704 [01:02<04:10,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 141/704 [01:02<04:08,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 141/704 [01:02<04:10,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 142/704 [01:02<04:08,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 142/704 [01:03<04:09,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 143/704 [01:03<04:07,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 143/704 [01:03<04:09,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 144/704 [01:03<04:07,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  20%|██        | 144/704 [01:03<04:08,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 145/704 [01:04<04:06,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 145/704 [01:04<04:08,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 146/704 [01:04<04:06,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 146/704 [01:04<04:07,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 147/704 [01:04<04:05,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 147/704 [01:05<04:07,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 148/704 [01:05<04:05,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 148/704 [01:05<04:06,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 149/704 [01:05<04:04,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██        | 149/704 [01:06<04:06,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██▏       | 150/704 [01:06<04:04,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██▏       | 150/704 [01:06<04:05,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██▏       | 151/704 [01:06<04:04,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  21%|██▏       | 151/704 [01:07<04:05,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 152/704 [01:07<04:03,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 152/704 [01:07<04:04,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 153/704 [01:07<04:03,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 153/704 [01:07<04:04,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 154/704 [01:07<04:02,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 154/704 [01:08<04:03,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 155/704 [01:08<04:02,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 155/704 [01:08<04:03,  2.25it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 156/704 [01:08<04:01,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 156/704 [01:09<04:02,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 157/704 [01:09<04:01,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 157/704 [01:09<04:02,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 158/704 [01:09<04:00,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  22%|██▏       | 158/704 [01:10<04:01,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 159/704 [01:10<04:00,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 159/704 [01:10<04:01,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 160/704 [01:10<03:59,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 160/704 [01:10<04:01,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 161/704 [01:10<03:59,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 161/704 [01:11<04:00,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 162/704 [01:11<03:58,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 162/704 [01:11<04:00,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 163/704 [01:11<03:58,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 163/704 [01:12<03:59,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 164/704 [01:12<03:57,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 164/704 [01:12<03:59,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 165/704 [01:12<03:57,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  23%|██▎       | 165/704 [01:13<03:58,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▎       | 166/704 [01:13<03:56,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▎       | 166/704 [01:13<03:58,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▎       | 167/704 [01:13<03:56,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▎       | 167/704 [01:13<03:57,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 168/704 [01:13<03:55,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 168/704 [01:14<03:57,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 169/704 [01:14<03:55,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 169/704 [01:14<03:56,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 170/704 [01:14<03:55,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 170/704 [01:15<03:56,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 171/704 [01:15<03:54,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 171/704 [01:15<03:55,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 172/704 [01:15<03:54,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  24%|██▍       | 172/704 [01:16<03:55,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 173/704 [01:16<03:53,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 173/704 [01:16<03:54,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 174/704 [01:16<03:53,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 174/704 [01:16<03:54,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 175/704 [01:16<03:52,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▍       | 175/704 [01:17<03:53,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 176/704 [01:17<03:52,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 176/704 [01:17<03:53,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 177/704 [01:17<03:51,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 177/704 [01:18<03:52,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 178/704 [01:18<03:51,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 178/704 [01:18<03:52,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 179/704 [01:18<03:50,  2.27it/s, v_num=11, train_loss=4.600]
Epoch 0:  25%|██▌       | 179/704 [01:19<03:51,  2.26it/s, v_num=11, train_loss=4.600]
Epoch 0:  26%|██▌       | 180/704 [01:19<03:50,  2.27it/s, v_num=11, train_loss=4.600]

everything is not crashing, and the model summary looks good, but
the training loss just doesn't change (different batch sample has a slight change, but not due to training of the model)

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.4.0): 2.3.3
#- PyTorch Version (e.g., 2.4): 2.3.1
#- Python version (e.g., 3.12): 3.10.14
#- OS (e.g., Linux):  Linux
#- CUDA/cuDNN version: 12.4
#- GPU models and configuration: 3090
#- How you installed Lightning(`conda`, `pip`, source): pip

The collect env script is not working, btw

Traceback (most recent call last):
 
  File "/conda/envs/ai/lib/python3.10/site-packages/pkg_resources/_vendor/pyparsing.py", line 2711, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pkg_resources._vendor.pyparsing.ParseException: Expected W:(abcd...) (at char 0), (line:1, col:1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
    raise InvalidRequirement(
pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse error at "'-cipy==1'": Expected W:(abcd...)

More info

No response

cc @Borda

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions