Input type and weight type error in  scene graph code

Hi, I have installed the code in python3.8, pytorch 1.8.0 and cuda11. And the debug_relationformer.ipynb runs well about Debug Dataloader and Debug Model part.
However, when I run the train.py using "nohup python3 train.py --config configs/scene_2d.yaml --cuda_visible_device 0 1 2 --exp_name VGtest1 --nproc_per_node 3 --b 16 &> log/Muti.out& ", there is an error:

*** Config file
configs/scene_2d.yaml
Experiment Name :  VGtest1
Batch size :  16
Running Distributed: True ; GPU: 0 ; RANK: 0
Number of parameters :  92944451
ERROR:ignite.engine.engine.RelationformerTrainer:Current run is terminating due to exception: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
ERROR:ignite.engine.engine.RelationformerTrainer:Current run is terminating due to exception: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
ERROR:ignite.engine.engine.RelationformerTrainer:Current run is terminating due to exception: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
ERROR:ignite.engine.engine.RelationformerTrainer:Engine run is terminating due to exception: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
ERROR:ignite.engine.engine.RelationformerTrainer:Engine run is terminating due to exception: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
ERROR:ignite.engine.engine.RelationformerTrainer:Engine run is terminating due to exception: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
Traceback (most recent call last):
  File "train.py", line 292, in <module>
    parallel.run(main, args)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/distributed/launcher.py", line 275, in run
    idist.spawn(self.backend, func, args=args, kwargs_dict=kwargs, **self._spawn_params)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/distributed/utils.py", line 323, in spawn
    comp_model_cls.spawn(
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/distributed/comp_models/native.py", line 304, in spawn
    start_processes(
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/distributed/comp_models/native.py", line 272, in _dist_worker_task_fn
    fn(local_rank, *args, **kw_dict)
  File "/home/ymf/dockerFile/relationformer/train.py", line 282, in main
    trainer.run()
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/monai/engines/trainer.py", line 56, in run
    super().run()
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/monai/engines/workflow.py", line 250, in run
    super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/engine/engine.py", line 702, in run
    return self._internal_run()
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/engine/engine.py", line 775, in _internal_run
    self._handle_exception(e)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/engine/engine.py", line 469, in _handle_exception
    raise e
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/engine/engine.py", line 745, in _internal_run
    time_taken = self._run_once_on_dataset()
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/engine/engine.py", line 850, in _run_once_on_dataset
    self._handle_exception(e)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/engine/engine.py", line 469, in _handle_exception
    raise e
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/ignite/engine/engine.py", line 833, in _run_once_on_dataset
    self.state.output = self._process_function(self, self.state.batch)
  File "/home/ymf/dockerFile/relationformer/trainer.py", line 40, in _iteration
    h, out = self.network(images)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 711, in forward
    output = self.module(*inputs, **kwargs)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ymf/dockerFile/relationformer/models/relationformer_2D.py", line 108, in forward
    features, pos = self.backbone(samples)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ymf/dockerFile/relationformer/models/deformable_detr_backbone.py", line 117, in forward
    xs = self[0](tensor_list)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ymf/dockerFile/relationformer/models/deformable_detr_backbone.py", line 84, in forward
    xs = self.body(tensor_list.tensors)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torchvision/models/_utils.py", line 63, in forward
    x = module(x)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/data/anaconda3/envs/ymf_rel38/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

Do you have any clue about this error and how to fix it? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input type and weight type error in scene graph code #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Input type and weight type error in scene graph code #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions