-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
3rd partyRelated to a 3rd-partyRelated to a 3rd-partybugSomething isn't workingSomething isn't workingoptimizationver: 2.3.x
Description
Bug description
Even after I set the learning rate to 1 and even 100,
the loss doesn't change at all, it is always 4.60.
I tried to debug into what happens, but it seems everything works fine, the loss is backwarded successfully, the grads of each parameters looks well, the optimizer is indeed called
What version are you seeing the problem on?
v2.3
How to reproduce the bug
class ClassificationTask(L.LightningModule):
def __init__(self, config: ClassificationTaskConfig)->None:
super().__init__()
self.save_hyperparameters(config.model_dump())
L.seed_everything(config.experiment_index) # use index as the seed for reproducibility
self.lit_data:ClassificationDataModule = config.dataset_config.get_lightning_data_module()
config.cls_model_config.num_of_classes = self.lit_data.num_of_classes
self.cls_model:HuggingfaceModel = config.cls_model_config.get_cls_model()
self.lit_data.set_transform_from_hf_image_preprocessor(hf_image_preprocessor=self.cls_model.image_preprocessor)
model_image_size:tuple[int, int] = (self.cls_model.image_preprocessor.size['height'], self.cls_model.image_preprocessor.size['width'])
self.example_input_array = torch.Tensor(1, self.cls_model.backbone.config.num_channels, *model_image_size)
self.softmax = nn.Softmax(dim=1)
self.loss = nn.CrossEntropyLoss(label_smoothing=config.label_smoothing)
self.automatic_optimization = False # The problem occurs when True, so I tried to use False to see what happens
def compute_model_logits(self, image_tensor:torch.Tensor)-> torch.Tensor:
return self.cls_model(image_tensor)
@override
def forward(self, image_tensor:torch.Tensor, *args, **kwargs)-> torch.Tensor:
return self.softmax(self.compute_model_logits(image_tensor))
def forward_loss(self, image_tensor: torch.Tensor, label_tensor:torch.Tensor)->torch.Tensor:
probs = self(image_tensor)
# return F.nll_loss(logits, label_tensor)
return self.loss(probs, label_tensor)
@override
def training_step(self, batch, batch_idx=None, *args, **kwargs)-> STEP_OUTPUT:
self.train()
opt = self.optimizers()
opt.zero_grad()
loss = self.forward_loss(*batch)
self.log("train_loss", loss, prog_bar=True)
# self.manual_backward(loss)
loss.backward()
opt.step()
return loss
@override
def configure_optimizers(self) -> OptimizerLRScheduler:
return torch.optim.AdamW(self.parameters(), lr=self.hparams.learning_rate)
from .core import ClassificationTask, ClassificationTaskConfig
config = ClassificationTaskConfig()
config.learning_rate = 3e-4 # doesn't work
config.learning_rate = 1000 # should expect a NaN if it is optimizing, try to debug
config.dataset_config.batch_size = 64
cls_task = ClassificationTask(config)
import lightning as L
from .utils import runs_path
from lightning.pytorch.callbacks.early_stopping import EarlyStopping
from lightning.pytorch.callbacks import ModelSummary, StochasticWeightAveraging, DeviceStatsMonitor
from lightning.pytorch.loggers import TensorBoardLogger, CSVLogger
trainer = L.Trainer(default_root_dir=runs_path, enable_checkpointing=True,
enable_model_summary=True,
num_sanity_val_steps=2,
callbacks=[
EarlyStopping(monitor="val_acc1", mode="max", check_finite=True,
patience=5,
check_on_train_epoch_end=False, # check on validation end
verbose=True),
ModelSummary(max_depth=3),
DeviceStatsMonitor(cpu_stats=True)
]
, logger=[TensorBoardLogger(save_dir=runs_path/"tensorboard"), CSVLogger(save_dir=runs_path)]
)
trainer.fit(cls_task, datamodule=cls_task.lit_data)
Error messages and logs
root
└── cls_model (HuggingfaceModel)
├── backbone (ViTModel)
│ ├── embeddings (ViTEmbeddings) cls_token:[1, 1, 768] position_embeddings:[1, 197, 768]
│ │ └── patch_embeddings (ViTPatchEmbeddings)
│ │ └── projection (Conv2d) weight:[768, 3, 16, 16] bias:[768]
│ ├── encoder (ViTEncoder)
│ │ └── layer (ModuleList)
│ │ └── 0-11(ViTLayer)
│ │ ├── attention (ViTAttention)
│ │ │ ├── attention (ViTSelfAttention)
│ │ │ │ └── query,key,value(Linear) weight:[768, 768] bias:[768]
│ │ │ └── output (ViTSelfOutput)
│ │ │ └── dense (Linear) weight:[768, 768] bias:[768]
│ │ ├── intermediate (ViTIntermediate)
│ │ │ └── dense (Linear) weight:[3072, 768] bias:[3072]
│ │ ├── output (ViTOutput)
│ │ │ └── dense (Linear) weight:[768, 3072] bias:[768]
│ │ └── layernorm_before,layernorm_after(LayerNorm) weight:[768] bias:[768]
│ ├── layernorm (LayerNorm) weight:[768] bias:[768]
│ └── pooler (ViTPooler)
│ └── dense (Linear) weight:[768, 768] bias:[768]
└── head (Linear) weight:[100, 768] bias:[100]
Files already downloaded and verified
Files already downloaded and verified
202
Sanity Checking: | | 0/? [00:00<?, ?it/s]
Sanity Checking: 0%| | 0/2 [00:00<?, ?it/s]
Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]
Sanity Checking DataLoader 0: 50%|█████ | 1/2 [00:00<00:00, 1.78it/s]
Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 2.78it/s]
Training: | | 0/? [00:00<?, ?it/s]
Training: 0%| | 0/704 [00:00<?, ?it/s]
Epoch 0: 0%| | 0/704 [00:00<?, ?it/s]
Epoch 0: 0%| | 1/704 [00:02<30:06, 0.39it/s]
Epoch 0: 0%| | 1/704 [00:02<30:07, 0.39it/s, v_num=11, train_loss=4.610]
Epoch 0: 0%| | 2/704 [00:03<17:54, 0.65it/s, v_num=11, train_loss=4.610]
Epoch 0: 0%| | 2/704 [00:03<17:55, 0.65it/s, v_num=11, train_loss=4.610]
Epoch 0: 0%| | 3/704 [00:03<13:59, 0.84it/s, v_num=11, train_loss=4.610]
Epoch 0: 0%| | 3/704 [00:03<14:01, 0.83it/s, v_num=11, train_loss=4.600]
Epoch 0: 1%| | 4/704 [00:03<11:26, 1.02it/s, v_num=11, train_loss=4.600]
Epoch 0: 1%| | 4/704 [00:04<11:49, 0.99it/s, v_num=11, train_loss=4.610]
Epoch 0: 1%| | 5/704 [00:04<09:31, 1.22it/s, v_num=11, train_loss=4.610]
Epoch 0: 1%| | 5/704 [00:04<10:25, 1.12it/s, v_num=11, train_loss=4.600]
Epoch 0: 1%| | 6/704 [00:04<08:46, 1.33it/s, v_num=11, train_loss=4.600]
Epoch 0: 1%| | 6/704 [00:04<09:30, 1.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 1%| | 7/704 [00:04<08:11, 1.42it/s, v_num=11, train_loss=4.600]
Epoch 0: 1%| | 7/704 [00:05<08:50, 1.31it/s, v_num=11, train_loss=4.610]
Epoch 0: 1%| | 8/704 [00:05<07:52, 1.47it/s, v_num=11, train_loss=4.610]
Epoch 0: 1%| | 8/704 [00:05<08:22, 1.39it/s, v_num=11, train_loss=4.600]
Epoch 0: 1%|▏ | 9/704 [00:05<07:35, 1.53it/s, v_num=11, train_loss=4.600]
Epoch 0: 1%|▏ | 9/704 [00:06<07:58, 1.45it/s, v_num=11, train_loss=4.610]
Epoch 0: 1%|▏ | 10/704 [00:06<07:18, 1.58it/s, v_num=11, train_loss=4.610]
Epoch 0: 1%|▏ | 10/704 [00:06<07:39, 1.51it/s, v_num=11, train_loss=4.610]
Epoch 0: 2%|▏ | 11/704 [00:06<06:59, 1.65it/s, v_num=11, train_loss=4.610]
Epoch 0: 2%|▏ | 11/704 [00:07<07:23, 1.56it/s, v_num=11, train_loss=4.610]
Epoch 0: 2%|▏ | 12/704 [00:07<06:48, 1.70it/s, v_num=11, train_loss=4.610]
Epoch 0: 2%|▏ | 12/704 [00:07<07:10, 1.61it/s, v_num=11, train_loss=4.610]
Epoch 0: 2%|▏ | 13/704 [00:07<06:39, 1.73it/s, v_num=11, train_loss=4.610]
Epoch 0: 2%|▏ | 13/704 [00:07<06:59, 1.65it/s, v_num=11, train_loss=4.600]
Epoch 0: 2%|▏ | 14/704 [00:07<06:30, 1.77it/s, v_num=11, train_loss=4.600]
Epoch 0: 2%|▏ | 14/704 [00:08<06:49, 1.68it/s, v_num=11, train_loss=4.600]
Epoch 0: 2%|▏ | 15/704 [00:08<06:23, 1.80it/s, v_num=11, train_loss=4.600]
Epoch 0: 2%|▏ | 15/704 [00:08<06:41, 1.72it/s, v_num=11, train_loss=4.600]
Epoch 0: 2%|▏ | 16/704 [00:08<06:16, 1.83it/s, v_num=11, train_loss=4.600]
Epoch 0: 2%|▏ | 16/704 [00:09<06:33, 1.75it/s, v_num=11, train_loss=4.610]
Epoch 0: 2%|▏ | 17/704 [00:09<06:11, 1.85it/s, v_num=11, train_loss=4.610]
Epoch 0: 2%|▏ | 17/704 [00:09<06:27, 1.77it/s, v_num=11, train_loss=4.600]
Epoch 0: 3%|▎ | 18/704 [00:09<06:06, 1.87it/s, v_num=11, train_loss=4.600]
Epoch 0: 3%|▎ | 18/704 [00:10<06:21, 1.80it/s, v_num=11, train_loss=4.600]
Epoch 0: 3%|▎ | 19/704 [00:10<06:02, 1.89it/s, v_num=11, train_loss=4.600]
Epoch 0: 3%|▎ | 19/704 [00:10<06:15, 1.82it/s, v_num=11, train_loss=4.610]
Epoch 0: 3%|▎ | 20/704 [00:10<05:57, 1.91it/s, v_num=11, train_loss=4.610]
Epoch 0: 3%|▎ | 20/704 [00:10<06:10, 1.84it/s, v_num=11, train_loss=4.610]
Epoch 0: 3%|▎ | 21/704 [00:10<05:53, 1.93it/s, v_num=11, train_loss=4.610]
Epoch 0: 3%|▎ | 21/704 [00:11<06:06, 1.86it/s, v_num=11, train_loss=4.600]
Epoch 0: 3%|▎ | 22/704 [00:11<05:50, 1.95it/s, v_num=11, train_loss=4.600]
Epoch 0: 3%|▎ | 22/704 [00:11<06:02, 1.88it/s, v_num=11, train_loss=4.610]
Epoch 0: 3%|▎ | 23/704 [00:11<05:48, 1.95it/s, v_num=11, train_loss=4.610]
Epoch 0: 3%|▎ | 23/704 [00:12<05:58, 1.90it/s, v_num=11, train_loss=4.600]
Epoch 0: 3%|▎ | 24/704 [00:12<05:44, 1.97it/s, v_num=11, train_loss=4.600]
Epoch 0: 3%|▎ | 24/704 [00:12<05:55, 1.91it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▎ | 25/704 [00:12<05:41, 1.99it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▎ | 25/704 [00:12<05:52, 1.93it/s, v_num=11, train_loss=4.610]
Epoch 0: 4%|▎ | 26/704 [00:13<05:39, 2.00it/s, v_num=11, train_loss=4.610]
Epoch 0: 4%|▎ | 26/704 [00:13<05:49, 1.94it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▍ | 27/704 [00:13<05:36, 2.01it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▍ | 27/704 [00:13<05:46, 1.95it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▍ | 28/704 [00:13<05:34, 2.02it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▍ | 28/704 [00:14<05:43, 1.97it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▍ | 29/704 [00:14<05:32, 2.03it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▍ | 29/704 [00:14<05:41, 1.98it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▍ | 30/704 [00:14<05:30, 2.04it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▍ | 30/704 [00:15<05:39, 1.99it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▍ | 31/704 [00:15<05:30, 2.03it/s, v_num=11, train_loss=4.600]
Epoch 0: 4%|▍ | 31/704 [00:15<05:36, 2.00it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▍ | 32/704 [00:15<05:27, 2.05it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▍ | 32/704 [00:15<05:34, 2.01it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▍ | 33/704 [00:15<05:24, 2.07it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▍ | 33/704 [00:16<05:32, 2.02it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▍ | 34/704 [00:16<05:23, 2.07it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▍ | 34/704 [00:16<05:30, 2.03it/s, v_num=11, train_loss=4.610]
Epoch 0: 5%|▍ | 35/704 [00:16<05:21, 2.08it/s, v_num=11, train_loss=4.610]
Epoch 0: 5%|▍ | 35/704 [00:17<05:29, 2.03it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▌ | 36/704 [00:17<05:20, 2.09it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▌ | 36/704 [00:17<05:27, 2.04it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▌ | 37/704 [00:17<05:18, 2.09it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▌ | 37/704 [00:18<05:25, 2.05it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▌ | 38/704 [00:18<05:17, 2.10it/s, v_num=11, train_loss=4.600]
Epoch 0: 5%|▌ | 38/704 [00:18<05:24, 2.06it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▌ | 39/704 [00:18<05:15, 2.11it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▌ | 39/704 [00:18<05:22, 2.06it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▌ | 40/704 [00:18<05:15, 2.11it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▌ | 40/704 [00:19<05:21, 2.07it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▌ | 41/704 [00:19<05:13, 2.12it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▌ | 41/704 [00:19<05:19, 2.07it/s, v_num=11, train_loss=4.610]
Epoch 0: 6%|▌ | 42/704 [00:19<05:12, 2.12it/s, v_num=11, train_loss=4.610]
Epoch 0: 6%|▌ | 42/704 [00:20<05:18, 2.08it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▌ | 43/704 [00:20<05:10, 2.13it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▌ | 43/704 [00:20<05:16, 2.09it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▋ | 44/704 [00:20<05:09, 2.13it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▋ | 44/704 [00:21<05:15, 2.09it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▋ | 45/704 [00:21<05:09, 2.13it/s, v_num=11, train_loss=4.600]
Epoch 0: 6%|▋ | 45/704 [00:21<05:14, 2.10it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 46/704 [00:21<05:07, 2.14it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 46/704 [00:21<05:13, 2.10it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 47/704 [00:21<05:06, 2.14it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 47/704 [00:22<05:12, 2.11it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 48/704 [00:22<05:05, 2.15it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 48/704 [00:22<05:10, 2.11it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 49/704 [00:22<05:05, 2.15it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 49/704 [00:23<05:09, 2.11it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 50/704 [00:23<05:04, 2.15it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 50/704 [00:23<05:08, 2.12it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 51/704 [00:23<05:03, 2.15it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 51/704 [00:24<05:08, 2.12it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 52/704 [00:24<05:02, 2.16it/s, v_num=11, train_loss=4.600]
Epoch 0: 7%|▋ | 52/704 [00:24<05:07, 2.12it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 53/704 [00:24<05:01, 2.16it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 53/704 [00:24<05:06, 2.13it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 54/704 [00:24<05:00, 2.16it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 54/704 [00:25<05:05, 2.13it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 55/704 [00:25<04:59, 2.17it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 55/704 [00:25<05:04, 2.13it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 56/704 [00:25<04:58, 2.17it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 56/704 [00:26<05:03, 2.14it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 57/704 [00:26<04:57, 2.17it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 57/704 [00:26<05:02, 2.14it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 58/704 [00:26<04:57, 2.17it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 58/704 [00:27<05:01, 2.14it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 59/704 [00:27<04:56, 2.18it/s, v_num=11, train_loss=4.600]
Epoch 0: 8%|▊ | 59/704 [00:27<05:00, 2.15it/s, v_num=11, train_loss=4.610]
Epoch 0: 9%|▊ | 60/704 [00:27<04:55, 2.18it/s, v_num=11, train_loss=4.610]
Epoch 0: 9%|▊ | 60/704 [00:27<04:59, 2.15it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▊ | 61/704 [00:27<04:54, 2.18it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▊ | 61/704 [00:28<04:58, 2.15it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▉ | 62/704 [00:28<04:53, 2.18it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▉ | 62/704 [00:28<04:57, 2.16it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▉ | 63/704 [00:28<04:53, 2.19it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▉ | 63/704 [00:29<04:56, 2.16it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▉ | 64/704 [00:29<04:52, 2.19it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▉ | 64/704 [00:29<04:56, 2.16it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▉ | 65/704 [00:29<04:51, 2.19it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▉ | 65/704 [00:30<04:55, 2.16it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▉ | 66/704 [00:30<04:50, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 9%|▉ | 66/704 [00:30<04:54, 2.17it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|▉ | 67/704 [00:30<04:50, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|▉ | 67/704 [00:30<04:53, 2.17it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|▉ | 68/704 [00:30<04:49, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|▉ | 68/704 [00:31<04:52, 2.17it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|▉ | 69/704 [00:31<04:48, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|▉ | 69/704 [00:31<04:52, 2.17it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|▉ | 70/704 [00:31<04:47, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|▉ | 70/704 [00:32<04:51, 2.18it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|█ | 71/704 [00:32<04:47, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|█ | 71/704 [00:32<04:50, 2.18it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|█ | 72/704 [00:32<04:46, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|█ | 72/704 [00:33<04:49, 2.18it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|█ | 73/704 [00:33<04:45, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 10%|█ | 73/704 [00:33<04:49, 2.18it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 74/704 [00:33<04:44, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 74/704 [00:33<04:48, 2.18it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 75/704 [00:33<04:44, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 75/704 [00:34<04:47, 2.19it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 76/704 [00:34<04:44, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 76/704 [00:34<04:46, 2.19it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 77/704 [00:34<04:43, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 77/704 [00:35<04:46, 2.19it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 78/704 [00:35<04:42, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 78/704 [00:35<04:45, 2.19it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 79/704 [00:35<04:42, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█ | 79/704 [00:36<04:44, 2.19it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█▏ | 80/704 [00:36<04:41, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 11%|█▏ | 80/704 [00:36<04:44, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 81/704 [00:36<04:40, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 81/704 [00:36<04:43, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 82/704 [00:36<04:39, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 82/704 [00:37<04:42, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 83/704 [00:37<04:39, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 83/704 [00:37<04:42, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 84/704 [00:37<04:38, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 84/704 [00:38<04:41, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 85/704 [00:38<04:38, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 85/704 [00:38<04:40, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 86/704 [00:38<04:37, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 86/704 [00:39<04:40, 2.20it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 87/704 [00:39<04:36, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▏ | 87/704 [00:39<04:39, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▎ | 88/704 [00:39<04:36, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 12%|█▎ | 88/704 [00:39<04:39, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 89/704 [00:39<04:36, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 89/704 [00:40<04:38, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 90/704 [00:40<04:35, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 90/704 [00:40<04:37, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 91/704 [00:40<04:34, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 91/704 [00:41<04:37, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 92/704 [00:41<04:34, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 92/704 [00:41<04:36, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 93/704 [00:41<04:33, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 93/704 [00:42<04:35, 2.21it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 94/704 [00:42<04:32, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 94/704 [00:42<04:35, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 95/704 [00:42<04:32, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 13%|█▎ | 95/704 [00:42<04:34, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▎ | 96/704 [00:42<04:31, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▎ | 96/704 [00:43<04:34, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 97/704 [00:43<04:31, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 97/704 [00:43<04:33, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 98/704 [00:43<04:30, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 98/704 [00:44<04:32, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 99/704 [00:44<04:30, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 99/704 [00:44<04:32, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 100/704 [00:44<04:29, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 100/704 [00:45<04:32, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 101/704 [00:45<04:29, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 101/704 [00:45<04:31, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 102/704 [00:45<04:28, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 14%|█▍ | 102/704 [00:45<04:30, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▍ | 103/704 [00:45<04:28, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▍ | 103/704 [00:46<04:30, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▍ | 104/704 [00:46<04:27, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▍ | 104/704 [00:46<04:29, 2.22it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▍ | 105/704 [00:46<04:26, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▍ | 105/704 [00:47<04:29, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▌ | 106/704 [00:47<04:26, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▌ | 106/704 [00:47<04:28, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▌ | 107/704 [00:47<04:25, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▌ | 107/704 [00:48<04:28, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▌ | 108/704 [00:48<04:25, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▌ | 108/704 [00:48<04:27, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▌ | 109/704 [00:48<04:24, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 15%|█▌ | 109/704 [00:48<04:26, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▌ | 110/704 [00:48<04:24, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▌ | 110/704 [00:49<04:26, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▌ | 111/704 [00:49<04:23, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▌ | 111/704 [00:49<04:25, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▌ | 112/704 [00:49<04:23, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▌ | 112/704 [00:50<04:25, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▌ | 113/704 [00:50<04:22, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▌ | 113/704 [00:50<04:24, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▌ | 114/704 [00:50<04:22, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▌ | 114/704 [00:51<04:24, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▋ | 115/704 [00:51<04:21, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▋ | 115/704 [00:51<04:23, 2.23it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▋ | 116/704 [00:51<04:21, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 16%|█▋ | 116/704 [00:51<04:23, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 117/704 [00:51<04:20, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 117/704 [00:52<04:22, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 118/704 [00:52<04:20, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 118/704 [00:52<04:21, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 119/704 [00:52<04:19, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 119/704 [00:53<04:21, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 120/704 [00:53<04:19, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 120/704 [00:53<04:20, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 121/704 [00:53<04:18, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 121/704 [00:54<04:20, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 122/704 [00:54<04:18, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 122/704 [00:54<04:19, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 123/704 [00:54<04:17, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 17%|█▋ | 123/704 [00:54<04:19, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 124/704 [00:54<04:17, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 124/704 [00:55<04:18, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 125/704 [00:55<04:16, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 125/704 [00:55<04:18, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 126/704 [00:55<04:15, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 126/704 [00:56<04:17, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 127/704 [00:56<04:15, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 127/704 [00:56<04:17, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 128/704 [00:56<04:15, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 128/704 [00:57<04:16, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 129/704 [00:57<04:14, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 129/704 [00:57<04:16, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 130/704 [00:57<04:13, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 18%|█▊ | 130/704 [00:57<04:15, 2.24it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▊ | 131/704 [00:57<04:13, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▊ | 131/704 [00:58<04:15, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 132/704 [00:58<04:12, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 132/704 [00:58<04:14, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 133/704 [00:58<04:12, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 133/704 [00:59<04:14, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 134/704 [00:59<04:12, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 134/704 [00:59<04:13, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 135/704 [00:59<04:11, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 135/704 [01:00<04:13, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 136/704 [01:00<04:11, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 136/704 [01:00<04:12, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 137/704 [01:00<04:10, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 19%|█▉ | 137/704 [01:00<04:12, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|█▉ | 138/704 [01:01<04:10, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|█▉ | 138/704 [01:01<04:11, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|█▉ | 139/704 [01:01<04:09, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|█▉ | 139/704 [01:01<04:11, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|█▉ | 140/704 [01:01<04:09, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|█▉ | 140/704 [01:02<04:10, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|██ | 141/704 [01:02<04:08, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|██ | 141/704 [01:02<04:10, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|██ | 142/704 [01:02<04:08, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|██ | 142/704 [01:03<04:09, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|██ | 143/704 [01:03<04:07, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|██ | 143/704 [01:03<04:09, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|██ | 144/704 [01:03<04:07, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 20%|██ | 144/704 [01:03<04:08, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██ | 145/704 [01:04<04:06, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██ | 145/704 [01:04<04:08, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██ | 146/704 [01:04<04:06, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██ | 146/704 [01:04<04:07, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██ | 147/704 [01:04<04:05, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██ | 147/704 [01:05<04:07, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██ | 148/704 [01:05<04:05, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██ | 148/704 [01:05<04:06, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██ | 149/704 [01:05<04:04, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██ | 149/704 [01:06<04:06, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██▏ | 150/704 [01:06<04:04, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██▏ | 150/704 [01:06<04:05, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██▏ | 151/704 [01:06<04:04, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 21%|██▏ | 151/704 [01:07<04:05, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 152/704 [01:07<04:03, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 152/704 [01:07<04:04, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 153/704 [01:07<04:03, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 153/704 [01:07<04:04, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 154/704 [01:07<04:02, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 154/704 [01:08<04:03, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 155/704 [01:08<04:02, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 155/704 [01:08<04:03, 2.25it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 156/704 [01:08<04:01, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 156/704 [01:09<04:02, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 157/704 [01:09<04:01, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 157/704 [01:09<04:02, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 158/704 [01:09<04:00, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 22%|██▏ | 158/704 [01:10<04:01, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 159/704 [01:10<04:00, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 159/704 [01:10<04:01, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 160/704 [01:10<03:59, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 160/704 [01:10<04:01, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 161/704 [01:10<03:59, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 161/704 [01:11<04:00, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 162/704 [01:11<03:58, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 162/704 [01:11<04:00, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 163/704 [01:11<03:58, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 163/704 [01:12<03:59, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 164/704 [01:12<03:57, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 164/704 [01:12<03:59, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 165/704 [01:12<03:57, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 23%|██▎ | 165/704 [01:13<03:58, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▎ | 166/704 [01:13<03:56, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▎ | 166/704 [01:13<03:58, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▎ | 167/704 [01:13<03:56, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▎ | 167/704 [01:13<03:57, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▍ | 168/704 [01:13<03:55, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▍ | 168/704 [01:14<03:57, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▍ | 169/704 [01:14<03:55, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▍ | 169/704 [01:14<03:56, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▍ | 170/704 [01:14<03:55, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▍ | 170/704 [01:15<03:56, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▍ | 171/704 [01:15<03:54, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▍ | 171/704 [01:15<03:55, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▍ | 172/704 [01:15<03:54, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 24%|██▍ | 172/704 [01:16<03:55, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▍ | 173/704 [01:16<03:53, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▍ | 173/704 [01:16<03:54, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▍ | 174/704 [01:16<03:53, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▍ | 174/704 [01:16<03:54, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▍ | 175/704 [01:16<03:52, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▍ | 175/704 [01:17<03:53, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▌ | 176/704 [01:17<03:52, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▌ | 176/704 [01:17<03:53, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▌ | 177/704 [01:17<03:51, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▌ | 177/704 [01:18<03:52, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▌ | 178/704 [01:18<03:51, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▌ | 178/704 [01:18<03:52, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▌ | 179/704 [01:18<03:50, 2.27it/s, v_num=11, train_loss=4.600]
Epoch 0: 25%|██▌ | 179/704 [01:19<03:51, 2.26it/s, v_num=11, train_loss=4.600]
Epoch 0: 26%|██▌ | 180/704 [01:19<03:50, 2.27it/s, v_num=11, train_loss=4.600]
everything is not crashing, and the model summary looks good, but
the training loss just doesn't change (different batch sample has a slight change, but not due to training of the model)
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.4.0): 2.3.3
#- PyTorch Version (e.g., 2.4): 2.3.1
#- Python version (e.g., 3.12): 3.10.14
#- OS (e.g., Linux): Linux
#- CUDA/cuDNN version: 12.4
#- GPU models and configuration: 3090
#- How you installed Lightning(`conda`, `pip`, source): pip
The collect env script is not working, btw
Traceback (most recent call last):
File "/conda/envs/ai/lib/python3.10/site-packages/pkg_resources/_vendor/pyparsing.py", line 2711, in parseImpl
raise ParseException(instring, loc, self.errmsg, self)
pkg_resources._vendor.pyparsing.ParseException: Expected W:(abcd...) (at char 0), (line:1, col:1)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
raise InvalidRequirement(
pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse error at "'-cipy==1'": Expected W:(abcd...)
More info
No response
cc @Borda
Metadata
Metadata
Assignees
Labels
3rd partyRelated to a 3rd-partyRelated to a 3rd-partybugSomething isn't workingSomething isn't workingoptimizationver: 2.3.x