-
-
Notifications
You must be signed in to change notification settings - Fork 79
Open
Description
--------------------------------------------------------------------------------------------------------
| epoch | train_loss | val_loss | train_acc | val_acc | ema_val_acc | total_time_seconds |
--------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "main.py", line 621, in <module>
main()
File "main.py", line 540, in main
for epoch_step, (inputs, targets) in enumerate(get_batches(data, key='train', batchsize=batchsize)):
File "main.py", line 428, in get_batches
images = batch_crop(data_dict[key]['images'], 32) # TODO: hardcoded image size for now?
File "main.py", line 390, in batch_crop
cropped_batch = torch.masked_select(inputs, crop_mask_batch).view(inputs.shape[0], inputs.shape[1], crop_size, crop_size)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.58 GiB (GPU 0; 5.81 GiB total capacity; 835.23 MiB already allocated; 2.35 GiB free; 1.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I have not looked at the code too closely, but it might be possible to shave off a few MB when preparing batches.
Thank you for this comment by the way.
Line 523 in 132829f
| ## has a timing feature too, but there's no synchronizes so I suspect the times reported are much faster than they may be in actuality |
I totally forgot to add torch.cuda.synchronize(), but it is finally fixed https://github.com/99991/cifar10-fast-simple Fortunately, it did not make much of a difference. I now get 14.3 seconds with my code vs 15.7 seconds with your code. Perhaps there is something during batch preparation which makes a difference?
tysam-code
Metadata
Metadata
Assignees
Labels
No labels