Skip to content

Commit 0c3e593

Browse files
authored
Add support for dinov2 with registers (#1110)
1 parent 9056f76 commit 0c3e593

File tree

2 files changed

+23
-0
lines changed

2 files changed

+23
-0
lines changed

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
1. **Depth Pro** (from Apple) released with the paper [Depth Pro: Sharp Monocular Metric Depth in Less Than a Second](https://arxiv.org/abs/2410.02073) by Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun.
2929
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
3030
1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (from Meta AI) released with the paper [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) by Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski.
31+
1. **[DINOv2 with Registers](https://huggingface.co/docs/transformers/model_doc/dinov2_with_registers)** (from Meta AI) released with the paper [DINOv2 with Registers](https://arxiv.org/abs/2309.16588) by Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski.
3132
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
3233
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
3334
1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.

src/models.js

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5389,6 +5389,26 @@ export class Dinov2ForImageClassification extends Dinov2PreTrainedModel {
53895389
}
53905390
//////////////////////////////////////////////////
53915391

5392+
//////////////////////////////////////////////////
5393+
export class Dinov2WithRegistersPreTrainedModel extends PreTrainedModel { }
5394+
5395+
/**
5396+
* The bare Dinov2WithRegisters Model transformer outputting raw hidden-states without any specific head on top.
5397+
*/
5398+
export class Dinov2WithRegistersModel extends Dinov2WithRegistersPreTrainedModel { }
5399+
5400+
/**
5401+
* Dinov2WithRegisters Model transformer with an image classification head on top (a linear layer on top of the final hidden state of the [CLS] token) e.g. for ImageNet.
5402+
*/
5403+
export class Dinov2WithRegistersForImageClassification extends Dinov2WithRegistersPreTrainedModel {
5404+
/**
5405+
* @param {any} model_inputs
5406+
*/
5407+
async _call(model_inputs) {
5408+
return new SequenceClassifierOutput(await super._call(model_inputs));
5409+
}
5410+
}
5411+
//////////////////////////////////////////////////
53925412

53935413
//////////////////////////////////////////////////
53945414
export class YolosPreTrainedModel extends PreTrainedModel { }
@@ -7018,6 +7038,7 @@ const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
70187038
['convnext', ['ConvNextModel', ConvNextModel]],
70197039
['convnextv2', ['ConvNextV2Model', ConvNextV2Model]],
70207040
['dinov2', ['Dinov2Model', Dinov2Model]],
7041+
['dinov2_with_registers', ['Dinov2WithRegistersModel', Dinov2WithRegistersModel]],
70217042
['resnet', ['ResNetModel', ResNetModel]],
70227043
['swin', ['SwinModel', SwinModel]],
70237044
['swin2sr', ['Swin2SRModel', Swin2SRModel]],
@@ -7263,6 +7284,7 @@ const MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = new Map([
72637284
['convnext', ['ConvNextForImageClassification', ConvNextForImageClassification]],
72647285
['convnextv2', ['ConvNextV2ForImageClassification', ConvNextV2ForImageClassification]],
72657286
['dinov2', ['Dinov2ForImageClassification', Dinov2ForImageClassification]],
7287+
['dinov2_with_registers', ['Dinov2WithRegistersForImageClassification', Dinov2WithRegistersForImageClassification]],
72667288
['resnet', ['ResNetForImageClassification', ResNetForImageClassification]],
72677289
['swin', ['SwinForImageClassification', SwinForImageClassification]],
72687290
['segformer', ['SegformerForImageClassification', SegformerForImageClassification]],

0 commit comments

Comments
 (0)