Skip to content

AssertionError: Shape mismatch: gt_mask torch.Size([2, 640, 480]), pred_mask torch.Size([3, 640, 480]) #88

@Zxl19990529

Description

@Zxl19990529

Hi, thanks for your work.
I am trying to reimplement the paper's results on GCG task with GranDf dataset following the document.
However, I encountered the following error:

Epoch: [0][  1/500]     Loss 6.0447 (6.6629)    CeLoss 2.9219 (3.6617)  MaskBCELoss 2.6241 (2.5692)     MaskDICELoss 0.4987 (0.4320)    MaskLoss 3.1229 (3.0012)
Epoch: [0][  2/500]     Loss 4.8270 (5.0576)    CeLoss 2.5938 (3.2406)  MaskBCELoss 1.7405 (1.4278)     MaskDICELoss 0.4927 (0.3891)    MaskLoss 2.2332 (1.8170)
Epoch: [0][  3/500]     Loss 5.5492 (5.2785)    CeLoss 3.6094 (3.3023)  MaskBCELoss 1.6736 (1.5641)     MaskDICELoss 0.2661 (0.4121)    MaskLoss 1.9398 (1.9761)
Epoch: [0][  4/500]     Loss 6.2472 (5.2980)    CeLoss 3.7344 (3.2023)  MaskBCELoss 2.0935 (1.6982)     MaskDICELoss 0.4194 (0.3975)    MaskLoss 2.5129 (2.0957)
Epoch: [0][  5/500]     Loss 5.3745 (4.9203)    CeLoss 3.0312 (3.2773)  MaskBCELoss 1.8710 (1.2579)     MaskDICELoss 0.4723 (0.3851)    MaskLoss 2.3432 (1.6429)
Traceback (most recent call last):
  File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/train.py", line 673, in <module>
    main(args)
  File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/train.py", line 467, in main
    dataset_iters = train(
  File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/train.py", line 557, in train
    output_dict = model(**data_batch)
  File "/home/zxl/anaconda3/envs/glamm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zxl/anaconda3/envs/glamm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/zxl/anaconda3/envs/glamm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1829, in forward
    loss = self.module(*inputs, **kwargs)
  File "/home/zxl/anaconda3/envs/glamm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/model/GLaMM.py", line 131, in forward
    return super().forward(**kwargs) if "past_key_values" in kwargs else self.model_forward(**kwargs)
  File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/model/GLaMM.py", line 167, in model_forward
    return self._calculate_losses(pred_masks, masks_list, output)
  File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/model/GLaMM.py", line 248, in _calculate_losses
    loss_components = self._compute_loss_components(pred_masks, masks_list, output)
  File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/model/GLaMM.py", line 267, in _compute_loss_components
    assert gt_mask.shape[0] == pred_mask.shape[
AssertionError: Shape mismatch: gt_mask torch.Size([2, 640, 480]), pred_mask torch.Size([3, 640, 480])

This error usually comes out after several epochs.
My shell script is:

export CUDA_VISIBLE_DEVICES=4,5  # Adjust based on your GPU setup
# Environment variable settings (optional, based on your requirements)
export CUDA_LAUNCH_BLOCKING=1


# Setting a dynamic master port (optional)
export MASTER_PORT=$(shuf -i 2000-65000 -n 1)
# Path to the checkpoint and output directory (modify according to your setup)
export CKPT_PATH="./ramdisk/GLaMM-GranD-Pretrained"
export OUTPUT_DIR_PATH='output/myoffical_finetune_glamm_gcg'

deepspeed --master_port $MASTER_PORT train.py \
  --version $CKPT_PATH \
  --dataset_dir ./Dataset_LLM/ \
  --vision_pretrained ./checkpoints/sam_vit_h_4b8939.pth \
  --exp_name $OUTPUT_DIR_PATH \
  --lora_r 8 \
  --lr 3e-4 \
  --pretrained \
  --use_segm_data \
  --seg_dataset "RefCoco_GCG||PSG_GCG||Flickr_GCG||GranDf_GCG" \
  --segm_sample_rates "3,3,3,1" \
  --val_dataset "FlickrGCGVal|RefCocoGCGVal|PsgGCGVal" \
  --epochs 10 \
  --steps_per_epoch 500 \
  --mask_validation

I also found that, this error will not happend if only GranDf_GCG is used. But for the other 3 sub-datasets, this error will happen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions