-
Notifications
You must be signed in to change notification settings - Fork 458
Add DEIMV2 Object Detection Model #5033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Model Manifests to be updated after a decision regarding the DETR models we want to expose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds the DEIMv2 object detection model to the OTX training extensions platform. DEIMv2 is an improved detection transformer that combines a DINOv3 backbone with spatial token attention (STA) and fine-grained distribution refinement (FDR) for enhanced object detection performance.
Key Changes:
- Added DEIMv2 model architecture with DINOv3/ViT-Tiny backbone and STA module
- Implemented transformer decoder with FDR for bounding box regression
- Added comprehensive unit tests and performance benchmarks
- Introduced data augmentation scheduling and multi-scale training support
Reviewed changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| library/src/otx/backend/native/models/detection/deimv2.py | DEIMv2 model class with factory pattern for model variants (x/l/m/s) |
| library/src/otx/backend/native/models/detection/backbones/dinov3sta.py | DINOv3 backbone with Spatial Token Attention for multi-scale features |
| library/src/otx/backend/native/models/detection/heads/deim_decoder.py | DEIM transformer decoder with FDR mechanism |
| library/src/otx/backend/native/models/detection/necks/dfine_hybrid_encoder.py | Hybrid encoder with FPN/PAN for feature fusion |
| library/src/otx/recipe/detection/deimv2_*.yaml | Training recipes for all DEIMv2 variants |
| library/tests/unit/backend/native/models/detection/test_deimv2.py | Comprehensive unit tests for DEIMv2 model |
| library/tests/perf_v2/tasks/detection.py | Performance test configuration for DEIMv2 variants |
Comments suppressed due to low confidence (1)
library/src/otx/backend/native/models/detection/backbones/dinov3sta.py:1
- Duplicate code: lines 533-536 and 537-539 both check
if self.eval_spatial_sizeand generate anchors. The second block overwrites the registered buffers from the first block. Remove the first block (lines 533-536) as it's redundant.
# Copyright (C) 2025 Intel Corporation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
library/docs/source/guide/explanation/algorithms/object_detection/object_detection.rst
Outdated
Show resolved
Hide resolved
| init_args: | ||
| scale: [640, 640] | ||
| keep_ratio: false | ||
| - class_path: otx.data.transform_libs.torchvision.RandomFlip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The policy is called no_aug, which means "no augmentation" if my assumption is correct. However, it uses RandomFlip which is an augmentation. Is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RandomFlip is very basic augmentation that gently enlarge training distribution. It is common to always include this augmentation as default. However, there is no experimental proof of it, just common thing to use.
Summary
resolves #5015
How to test
otx train --config src/otx/recipe/detection/deimv2_l.yaml --data_root tests/assests/car_tree_bugChecklist