Add DINOv3 Backbone by YoussefAboelwafa · Pull Request #389 · IDEA-Research/detrex

YoussefAboelwafa · 2025-12-14T11:51:34Z

Overview

This PR adds support for using DINOv3 (self-supervised vision transformers) as backbones in the detrex framework, specifically integrated with the DETA object detection model.

DINOv3 represents a new generation of self-supervised vision models that achieve state-of-the-art performance across various vision tasks. By integrating DINOv3 backbones, users can leverage powerful pretrained representations for object detection.

Motivation

State-of-the-art Pretraining: DINOv3 models are pretrained on massive datasets (LVD-1689M) using self-supervised learning, providing robust visual representations
Model Variety: Supports multiple architectures (ViT-S/B/L/H and ConvNeXt variants) allowing flexibility in model capacity vs. efficiency tradeoffs
Community Interest: Growing adoption of DINOv3 in computer vision research warrants native support in detrex
Consistent API: Follows detrex's existing backbone patterns (similar to EVA integration)

Changes Made

New Files

detrex/modeling/backbone/dinov3_backbone.py
- Main backbone wrapper implementing DINOv3Backbone and DINOv3SimpleFeaturePyramid
- Supports 7 ViT variants and 4 ConvNeXt variants
- Provides multi-scale feature extraction from intermediate layers
- Handles checkpoint loading and weight freezing
- Full documentation with usage examples
projects/deta/configs/models/deta_dinov3.py
- Model configuration for DETA with DINOv3 backbone
- Defines default architecture using ViT-Base
- Configures ChannelMapper neck for feature pyramid generation
- Sets up DeformableDETR transformer and DETA criterion
projects/deta/configs/deta_dinov3_vitb16.py
- Complete training configuration for DETA + DINOv3-ViT-Base
- Includes extensive comments and documentation
- Provides examples for switching between different DINOv3 variants
- Configures optimizer, dataloader, and training hyperparameters
projects/deta/configs/README_DINOV3.md
- Comprehensive documentation for using DINOv3 with DETA
- Setup instructions and checkpoint download links
- Configuration examples for different model variants
- Training tips and best practices

Modified Files

detrex/modeling/backbone/__init__.py
- Added imports for DINOv3Backbone and DINOv3SimpleFeaturePyramid

Features

Supported Models

Vision Transformers (ViT):

ViT-Small/16 (384 dims, 12 layers)
ViT-Base/16 (768 dims, 12 layers)
ViT-Large/16 (1024 dims, 24 layers)
ViT-Huge+/16 (1280 dims, 32 layers)

ConvNeXt:

ConvNeXt-Tiny
ConvNeXt-Small
ConvNeXt-Base
ConvNeXt-Large

Key Capabilities

Flexible Feature Extraction: Extract features from any intermediate layer
Frozen or Fine-tuned Modes: Support for transfer learning with frozen backbones or end-to-end fine-tuning
Multi-scale Features: Generate hierarchical features similar to traditional CNN backbones
Checkpoint Loading: Seamless loading of official DINOv3 pretrained weights
Memory Efficient: Optional gradient checkpointing for large models

Usage Example

# Basic usage with default ViT-Base
python projects/deta/train_net.py \
    --config-file projects/deta/configs/deta_dinov3_vitb16.py \
    --num-gpus 4

Backward Compatibility

No breaking changes to existing code
Purely additive changes
Existing configs and models unaffected

YoussefAboelwafa added 4 commits December 14, 2025 13:20

dino v3 files

1d43097

dino v3 backbone

44b6584

Example usage of DINO v3 backbone with DETA

a6240ea

dino v3 backbone documented

2e54005

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DINOv3 Backbone#389

Add DINOv3 Backbone#389
YoussefAboelwafa wants to merge 4 commits intoIDEA-Research:mainfrom
YoussefAboelwafa:main

YoussefAboelwafa commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YoussefAboelwafa commented Dec 14, 2025

Overview

Motivation

Changes Made

New Files

Modified Files

Features

Supported Models

Key Capabilities

Usage Example

Backward Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant