Skip to content

Commit f5bc758

Browse files
authored
Add support for Table Transformer model (#477)
1 parent 44b62ed commit f5bc758

File tree

4 files changed

+38
-0
lines changed

4 files changed

+38
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -332,6 +332,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
332332
1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (from University of Würzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte.
333333
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
334334
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
335+
1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham.
335336
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
336337
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
337338
1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (from HUST-VL) released with the paper [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (from University of Würzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte.
6868
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
6969
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
70+
1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham.
7071
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
7172
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
7273
1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (from HUST-VL) released with the paper [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang.

scripts/supported_models.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -810,6 +810,16 @@
810810
'hkunlp/instructor-large',
811811
],
812812
},
813+
'table-transformer': {
814+
# Object detection
815+
'object-detection': [
816+
'microsoft/table-transformer-detection',
817+
'microsoft/table-transformer-structure-recognition',
818+
'microsoft/table-transformer-structure-recognition-v1.1-all',
819+
'microsoft/table-transformer-structure-recognition-v1.1-fin',
820+
'microsoft/table-transformer-structure-recognition-v1.1-pub',
821+
],
822+
},
813823
'trocr': { # NOTE: also a `vision-encoder-decoder`
814824
# Text-to-image
815825
'text-to-image': [

src/models.js

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3678,6 +3678,30 @@ export class DetrSegmentationOutput extends ModelOutput {
36783678
}
36793679
//////////////////////////////////////////////////
36803680

3681+
//////////////////////////////////////////////////
3682+
export class TableTransformerPreTrainedModel extends PreTrainedModel { }
3683+
3684+
/**
3685+
* The bare Table Transformer Model (consisting of a backbone and encoder-decoder Transformer)
3686+
* outputting raw hidden-states without any specific head on top.
3687+
*/
3688+
export class TableTransformerModel extends TableTransformerPreTrainedModel { }
3689+
3690+
/**
3691+
* Table Transformer Model (consisting of a backbone and encoder-decoder Transformer)
3692+
* with object detection heads on top, for tasks such as COCO detection.
3693+
*/
3694+
export class TableTransformerForObjectDetection extends TableTransformerPreTrainedModel {
3695+
/**
3696+
* @param {any} model_inputs
3697+
*/
3698+
async _call(model_inputs) {
3699+
return new TableTransformerObjectDetectionOutput(await super._call(model_inputs));
3700+
}
3701+
}
3702+
export class TableTransformerObjectDetectionOutput extends DetrObjectDetectionOutput { }
3703+
//////////////////////////////////////////////////
3704+
36813705

36823706
//////////////////////////////////////////////////
36833707
export class DeiTPreTrainedModel extends PreTrainedModel { }
@@ -4767,6 +4791,7 @@ const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
47674791
['audio-spectrogram-transformer', ['ASTModel', ASTModel]],
47684792

47694793
['detr', ['DetrModel', DetrModel]],
4794+
['table-transformer', ['TableTransformerModel', TableTransformerModel]],
47704795
['vit', ['ViTModel', ViTModel]],
47714796
['mobilevit', ['MobileViTModel', MobileViTModel]],
47724797
['owlvit', ['OwlViTModel', OwlViTModel]],
@@ -4953,6 +4978,7 @@ const MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = new Map([
49534978

49544979
const MODEL_FOR_OBJECT_DETECTION_MAPPING_NAMES = new Map([
49554980
['detr', ['DetrForObjectDetection', DetrForObjectDetection]],
4981+
['table-transformer', ['TableTransformerForObjectDetection', TableTransformerForObjectDetection]],
49564982
['yolos', ['YolosForObjectDetection', YolosForObjectDetection]],
49574983
]);
49584984

0 commit comments

Comments
 (0)