-
Notifications
You must be signed in to change notification settings - Fork 53
Open
Description
Hi, thanks for your work.
I am trying to reimplement the paper's results on GCG task with GranDf dataset following the document.
However, I encountered the following error:
Epoch: [0][ 1/500] Loss 6.0447 (6.6629) CeLoss 2.9219 (3.6617) MaskBCELoss 2.6241 (2.5692) MaskDICELoss 0.4987 (0.4320) MaskLoss 3.1229 (3.0012)
Epoch: [0][ 2/500] Loss 4.8270 (5.0576) CeLoss 2.5938 (3.2406) MaskBCELoss 1.7405 (1.4278) MaskDICELoss 0.4927 (0.3891) MaskLoss 2.2332 (1.8170)
Epoch: [0][ 3/500] Loss 5.5492 (5.2785) CeLoss 3.6094 (3.3023) MaskBCELoss 1.6736 (1.5641) MaskDICELoss 0.2661 (0.4121) MaskLoss 1.9398 (1.9761)
Epoch: [0][ 4/500] Loss 6.2472 (5.2980) CeLoss 3.7344 (3.2023) MaskBCELoss 2.0935 (1.6982) MaskDICELoss 0.4194 (0.3975) MaskLoss 2.5129 (2.0957)
Epoch: [0][ 5/500] Loss 5.3745 (4.9203) CeLoss 3.0312 (3.2773) MaskBCELoss 1.8710 (1.2579) MaskDICELoss 0.4723 (0.3851) MaskLoss 2.3432 (1.6429)
Traceback (most recent call last):
File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/train.py", line 673, in <module>
main(args)
File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/train.py", line 467, in main
dataset_iters = train(
File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/train.py", line 557, in train
output_dict = model(**data_batch)
File "/home/zxl/anaconda3/envs/glamm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zxl/anaconda3/envs/glamm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/zxl/anaconda3/envs/glamm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1829, in forward
loss = self.module(*inputs, **kwargs)
File "/home/zxl/anaconda3/envs/glamm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/model/GLaMM.py", line 131, in forward
return super().forward(**kwargs) if "past_key_values" in kwargs else self.model_forward(**kwargs)
File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/model/GLaMM.py", line 167, in model_forward
return self._calculate_losses(pred_masks, masks_list, output)
File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/model/GLaMM.py", line 248, in _calculate_losses
loss_components = self._compute_loss_components(pred_masks, masks_list, output)
File "/mnt/nasv3_2/zhangxinliang/LMMs/groundingLMM/model/GLaMM.py", line 267, in _compute_loss_components
assert gt_mask.shape[0] == pred_mask.shape[
AssertionError: Shape mismatch: gt_mask torch.Size([2, 640, 480]), pred_mask torch.Size([3, 640, 480])This error usually comes out after several epochs.
My shell script is:
export CUDA_VISIBLE_DEVICES=4,5 # Adjust based on your GPU setup
# Environment variable settings (optional, based on your requirements)
export CUDA_LAUNCH_BLOCKING=1
# Setting a dynamic master port (optional)
export MASTER_PORT=$(shuf -i 2000-65000 -n 1)
# Path to the checkpoint and output directory (modify according to your setup)
export CKPT_PATH="./ramdisk/GLaMM-GranD-Pretrained"
export OUTPUT_DIR_PATH='output/myoffical_finetune_glamm_gcg'
deepspeed --master_port $MASTER_PORT train.py \
--version $CKPT_PATH \
--dataset_dir ./Dataset_LLM/ \
--vision_pretrained ./checkpoints/sam_vit_h_4b8939.pth \
--exp_name $OUTPUT_DIR_PATH \
--lora_r 8 \
--lr 3e-4 \
--pretrained \
--use_segm_data \
--seg_dataset "RefCoco_GCG||PSG_GCG||Flickr_GCG||GranDf_GCG" \
--segm_sample_rates "3,3,3,1" \
--val_dataset "FlickrGCGVal|RefCocoGCGVal|PsgGCGVal" \
--epochs 10 \
--steps_per_epoch 500 \
--mask_validationI also found that, this error will not happend if only GranDf_GCG is used. But for the other 3 sub-datasets, this error will happen.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels