Skip to content

[Bug]: Exported end2end ONNX model produces poor COCO validation results #146

@Fredrik00

Description

@Fredrik00

Before Reporting

  • I have pulled the latest code of main branch to run again and the bug still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。

  • I have read the README carefully and no error occured during the installation process. (Otherwise, we recommand that you can ask a question using the Question template) 我已经仔细阅读了README上的操作指引,并且在安装过程中没有错误发生。(否则,我们建议您使用Question模板向我们进行提问)

Search before reporting

  • I have searched the DAMO-YOLO issues and found no similar bugs. 我已经在issue列表中搜索但是没有发现类似的bug报告。

OS

Ubuntu 24.04

Device

RTX 4090

CUDA version

12.5

TensorRT version

No response

Python version

3.10

PyTorch version

1.13.1

torchvision version

0.14.1

Describe the bug

I have been attempting to train and export a tinynasL25_s model on the COCO dataset, but getting terrible results from the exported model. I have exported the model end2end, which if I am interpreting the code correctly should have given me at most 100 detections after NMS, but in many cases I am still getting 1000+ detections. Detections do however appear to be filtered by a minimum confidence score of 0.05.

After 60 epochs of training on the COCO dataset I get an evaluation score of:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.365
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.514
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.394
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.208
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.403
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.488
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.320
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.549
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.613
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.424
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.673
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.771

I export this model using:
python tools/converter.py -f configs/damoyolo_tinynasL25_S.py -c workdirs/damoyolo_tinynasL25_S/epoch_60_ckpt.pth --batch_size 1 --img_size 640 --end2end --ort

But when evaluating the exported model on COCO I get the following results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.008
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.017
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.003
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.047

I have debugged the pre-processing and post-processing steps I have added for running the ONNX model and it looks consistent with the demo script. I am using the same COCO validation scripts for YOLOX and RT-DETR models and have no issues with those. Looks to me like something must be wrong with the export script.

To Reproduce

  1. Train model using configs/damoyolo_tinynasL25_S.py
  2. Export model to ONNX using: python tools/converter.py -f configs/damoyolo_tinynasL25_S.py -c workdirs/damoyolo_tinynasL25_S/epoch_60_ckpt.pth --batch_size 1 --img_size 640 --end2end --ort
  3. Evaluate ONNX model on COCO validation set

Hyper-parameters/Configs

No response

Logs

No response

Screenshots

No response

Additional

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions