Skip to content

Conversation

@kprokofi
Copy link
Contributor

@kprokofi kprokofi commented Nov 25, 2025

Summary

resolves #5015

  • Add DinoV3 and VIT tiny as a backbones for detection, primarily for DeimV2 model
  • Add DEIMV2 model (OTXModel, Encoder, Decoder), e2e training, export
  • Experiment with pre-processing, Copy-blend, EMA, learning rate and its schedule, model weights
  • Add Unit tests, perf tests
  • Provide final benchmark numbers (vs other DETR variants)

How to test

otx train --config src/otx/recipe/detection/deimv2_l.yaml --data_root tests/assests/car_tree_bug

Checklist

  • The PR title and description are clear and descriptive
  • I have manually tested the changes
  • All changes are covered by automated tests
  • All related issues are linked to this PR (if applicable)
  • Documentation has been updated (if applicable)

@kprokofi kprokofi added this to the Geti Tune MVP milestone Nov 25, 2025
@kprokofi kprokofi added the ALGO Any changes in OTX Algo Tasks implementation label Nov 25, 2025
@github-actions github-actions bot added the TEST Any changes in tests label Nov 25, 2025
@leoll2 leoll2 removed this from the Geti Tune MVP milestone Nov 27, 2025
@github-actions github-actions bot added the DOC Improvements or additions to documentation label Dec 2, 2025
@kprokofi
Copy link
Contributor Author

kprokofi commented Dec 2, 2025

Model Manifests to be updated after a decision regarding the DETR models we want to expose

@kprokofi kprokofi marked this pull request as ready for review December 2, 2025 22:39
@kprokofi kprokofi requested a review from a team as a code owner December 2, 2025 22:39
Copilot AI review requested due to automatic review settings December 2, 2025 22:39
@kprokofi kprokofi changed the title [WIP] Add DEIMV2 Object Detection Model Add DEIMV2 Object Detection Model Dec 2, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the DEIMv2 object detection model to the OTX training extensions platform. DEIMv2 is an improved detection transformer that combines a DINOv3 backbone with spatial token attention (STA) and fine-grained distribution refinement (FDR) for enhanced object detection performance.

Key Changes:

  • Added DEIMv2 model architecture with DINOv3/ViT-Tiny backbone and STA module
  • Implemented transformer decoder with FDR for bounding box regression
  • Added comprehensive unit tests and performance benchmarks
  • Introduced data augmentation scheduling and multi-scale training support

Reviewed changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
library/src/otx/backend/native/models/detection/deimv2.py DEIMv2 model class with factory pattern for model variants (x/l/m/s)
library/src/otx/backend/native/models/detection/backbones/dinov3sta.py DINOv3 backbone with Spatial Token Attention for multi-scale features
library/src/otx/backend/native/models/detection/heads/deim_decoder.py DEIM transformer decoder with FDR mechanism
library/src/otx/backend/native/models/detection/necks/dfine_hybrid_encoder.py Hybrid encoder with FPN/PAN for feature fusion
library/src/otx/recipe/detection/deimv2_*.yaml Training recipes for all DEIMv2 variants
library/tests/unit/backend/native/models/detection/test_deimv2.py Comprehensive unit tests for DEIMv2 model
library/tests/perf_v2/tasks/detection.py Performance test configuration for DEIMv2 variants
Comments suppressed due to low confidence (1)

library/src/otx/backend/native/models/detection/backbones/dinov3sta.py:1

  • Duplicate code: lines 533-536 and 537-539 both check if self.eval_spatial_size and generate anchors. The second block overwrites the registered buffers from the first block. Remove the first block (lines 533-536) as it's redundant.
# Copyright (C) 2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

init_args:
scale: [640, 640]
keep_ratio: false
- class_path: otx.data.transform_libs.torchvision.RandomFlip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The policy is called no_aug, which means "no augmentation" if my assumption is correct. However, it uses RandomFlip which is an augmentation. Is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomFlip is very basic augmentation that gently enlarge training distribution. It is common to always include this augmentation as default. However, there is no experimental proof of it, just common thing to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ALGO Any changes in OTX Algo Tasks implementation DOC Improvements or additions to documentation TEST Any changes in tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add DEIMv2 Object Detection Model

4 participants