Skip to content

CUDA out of memory using your default model. #31

@Anurag-Swarnim-Yadav

Description

@Anurag-Swarnim-Yadav

I am trying to run your golden data set with default model and parameters with a batch size of ' 1 ' on sequencer-train. sh. In the paper, you have mentioned using K80 GPU which has 24GB of memory. I am using GeForce 2080 Ti which has 12GB of memory. So I am using 2 of GeForce and changed -world_size 2 and -gpu_ranks 0 1 but still getting CUDA out of memory. Could you please guide us on what can be the possible issue?

Traceback (most recent call last):
  File "/home/anuragswar.yadav/Anurag/chai/lib/OpenNMT-py/train.py", line 63, in run
    single_main(opt, device_id)
  File "/home/anuragswar.yadav/Anurag/chai/lib/OpenNMT-py/onmt/train_single.py", line 132, in main
    model = build_model(model_opt, opt, fields, checkpoint)
  File "/home/anuragswar.yadav/Anurag/chai/lib/OpenNMT-py/onmt/model_builder.py", line 301, in build_model
    model = build_base_model(model_opt, fields, use_gpu(opt), checkpoint)
  File "/home/anuragswar.yadav/Anurag/chai/lib/OpenNMT-py/onmt/model_builder.py", line 294, in build_base_model
    model.to(device)
  File "/home/anuragswar.yadav/anaconda3/envs/sequencer/lib/python3.6/site-packages/torch-1.6.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 607, in to
    return self._apply(convert)
  File "/home/anuragswar.yadav/anaconda3/envs/sequencer/lib/python3.6/site-packages/torch-1.6.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/home/anuragswar.yadav/anaconda3/envs/sequencer/lib/python3.6/site-packages/torch-1.6.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/home/anuragswar.yadav/anaconda3/envs/sequencer/lib/python3.6/site-packages/torch-1.6.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "/home/anuragswar.yadav/anaconda3/envs/sequencer/lib/python3.6/site-packages/torch-1.6.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 376, in _apply
    param_applied = fn(param)
  File "/home/anuragswar.yadav/anaconda3/envs/sequencer/lib/python3.6/site-packages/torch-1.6.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 605, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: out of memory

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions