CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection

Zhichao Sun, Huazhang Hu, Yidong Ma, Gang Liu, Yibo Chen, Xu Tang, Yao Hu, Yongchao Xu

Xiaohongshu Inc. Wuhan university

Paper

NeurIPS 2025

Overview

We propose CQ-DINO, a category query-based object detection framework for vast vocabulary object detection.

Installation

The recommended configuration is 8 A100 GPUs, with CUDA version 12.1. The other configurations in MMDetection should also work.

Please follow the guide to install and set up of the mmdetection.

conda create --name openmmlab python=3.10.6 -y
conda activate openmmlab

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

pip install -U openmim
mim install mmengine
mim install "mmcv==2.2.0"

git clone [email protected]:RedAIGC/CQ-DINO.git
cd CQ-DINO
pip install -v -e .

Dataset

The dataset preparation is the same as the one in mmgrounding of MMDetection, which you can refer to.

V3Det

Please download and prepare V3Det Dataset at V3Det Homepage and V3Det Github. After downloading and unzipping, place the dataset or create a symbolic link to it in the data/v3det directory, with the following directory structure:

CQ-DINO
├── configs
├── data
│   ├── v3det
│   │   ├── annotations
│   │   |   ├── v3det_2023_v1_train.json
│   │   ├── images
│   │   │   ├── a00000066
│   │   │   │   ├── xxx.jpg
│   │   │   ├── ...

Then use coco2odvg.py to convert it into the ODVG format required for training:

python tools/dataset_converters/coco2odvg.py data/v3det/annotations/v3det_2023_v1_train.json -d v3det

After the program has run, two new files v3det_2023_v1_train_od.json and v3det_2023_v1_label_map.json will be created in the data/v3det/annotations directory, with the complete structure as follows:

CQ-DINO
├── configs
├── data
│   ├── v3det
│   │   ├── annotations
│   │   |   ├── v3det_2023_v1_train.json
│   │   |   ├── v3det_2023_v1_train_od.json
│   │   |   ├── v3det_2023_v1_label_map.json
│   │   ├── images
│   │   │   ├── a00000066
│   │   │   │   ├── xxx.jpg
│   │   │   ├── ...

COCO 2017

Please download it from the COCO official website or from opendatalab. After downloading and unzipping, place the dataset or create a symbolic link to the data/coco directory. The directory structure is as follows:

CQ-DINO
├── configs
├── data
│   ├── coco
│   │   ├── annotations
│   │   │   ├── instances_train2017.json
│   │   │   ├── instances_val2017.json
│   │   ├── train2017
│   │   │   ├── xxx.jpg
│   │   │   ├── ...
│   │   ├── val2017
│   │   │   ├── xxxx.jpg
│   │   │   ├── ...

Then use coco2odvg.py to convert it into the ODVG format required for training:

python tools/dataset_converters/coco2odvg.py data/coco/annotations/instances_train2017.json -d coco

This will generate new files, instances_train2017_od.json and coco2017_label_map.json, in the data/coco/annotations/ directory. The complete dataset structure is as follows:

CQ-DINO
├── configs
├── data
│   ├── coco
│   │   ├── annotations
│   │   │   ├── instances_train2017.json
│   │   │   ├── instances_train2017_od.json
│   │   │   ├── coco2017_label_map.json
│   │   │   ├── instances_val2017.json
│   │   ├── train2017
│   │   │   ├── xxx.jpg
│   │   │   ├── ...
│   │   ├── val2017
│   │   │   ├── xxxx.jpg
│   │   │   ├── ...

Usage

Download the first stage parametres from Google drive in the directory stage1.
Download the category embeddings from Google drive.
The complete structure is as follows:

CQ-DINO
├── configs
├── stage1
│   ├── cqdino_swinb1k_v3det_stage1.pth
│   ├── cqdino_swinb22k_v3det_stage1.pth
│   ├── cqdino_swinl_coco_stage1.pth
│   ├── cqdino_swinl_v3det_stage1.pth
├── v3det_clip_embeddings.pth
├── coco_clip_embeddings.pth

Download the BERT Base model from Huggingface and replace the lang_model_name in the config files. For example:

_base_ = [
    '../_base_/datasets/coco_detection.py',
    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
]

q2l_config_name = 'q2l_config.json'
lang_model_name = '/home/huggingface_hub/models--google-bert--bert-base-uncased'

Train

# single gpu
python tools/train.py configs/cqdino/cqdino_tree_swinb22k_v3det.py 

# multi gpu
bash tools/dist_train.sh configs/cqdino/cqdino_tree_swinb22k_v3det.py  NUM_GPUs

Inference

Download the checkpoint from Google drive.

# single gpu
python tools/test.py configs/cqdino/cqdino_tree_swinb22k_v3det.py cqdino_swinb22k_v3det.pth

# multi gpu
bash tools/dist_test.sh configs/cqdino/cqdino_tree_swinb22k_v3det.py cqdino_swinb22k_v3det.pth NUM_GPUs

Citation

@article{sun2025cq,
         title={{CQ-DINO}: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection},
         author={Sun, Zhichao and Hu, Huazhang and Ma, Yidong and Liu, Gang and Chen, Nemo and Tang, Xu and Xu, Yongchao},
         journal={arXiv preprint arXiv:2503.18430},
         year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
configs		configs
fig		fig
mmdet		mmdet
requirements		requirements
tools		tools
README.md		README.md
mmdetection_README.md		mmdetection_README.md
q2l_config.json		q2l_config.json
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tree_structure_category_id.csv		tree_structure_category_id.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection

Xiaohongshu Inc. Wuhan university

Paper

NeurIPS 2025

Overview

Table of Contents

Installation

Dataset

V3Det

COCO 2017

Usage

Train

Inference

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

FireRedTeam/CQ-DINO

Folders and files

Latest commit

History

Repository files navigation

CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection

Xiaohongshu Inc. Wuhan university

Paper

NeurIPS 2025

Overview

Table of Contents

Installation

Dataset

V3Det

COCO 2017

Usage

Train

Inference

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages