Skip to content

models Models documentation

github-actions[bot] edited this page Dec 15, 2023 · 36 revisions

Models

Models in this category


  • AutoML-Image-Classification

    Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...

  • AutoML-Image-Instance-Segmentation

    Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...

  • AutoML-Image-Object-Detection

    Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...

  • AutoML-Named-Entity-Recognition

    Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...

  • AutoML-Text-Classification

    Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...

  • bytetrack_yolox_x_crowdhuman_mot17-private-half

    bytetrack_yolox_x_crowdhuman_mot17-private-half model is from OpenMMLab's MMTracking library. This model is <a href="https://github.com/open-mmlab/mmtracking/blob/master/configs/mot/bytetrack/metafile.yml#L24" t...

  • compvis-stable-diffusion-v1-4

    CompVis/stable-diffusion-v1-4 is a latent text-to-image diffusion model known for generating highly realistic images from textual input. This model incorporates a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Im...

  • deformable_detr_twostage_refine_r50_16x2_50e_coco

    deformable_detr_twostage_refine_r50_16x2_50e_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6...

  • distilbert-base-cased

    The DistilBERT model is a smaller, faster version of the BERT model for Transformer-based language modeling with 40% fewer parameters and 60% faster run time while retaining 95% of BERT's performance on the GLUE language understanding benchmark. This English language question answering model has ...

  • distilbert-base-uncased

    The DistilBERT base model (uncased) is a distilled version of the BERT base model that is smaller and faster than BERT. It was introduced in a specific paper and the code for creating the model can be found on a specific webpage. The model is uncased so it doesn't differentiate between lower and ...

  • distilbert-base-uncased-distilled-squad

    The DistilBERT model is a distilled version of the BERT language model with 40% fewer parameters, 60% faster run time, but with 95% of BERT's performance. It is trained for question answering and has a F1 score of 87.1 on SQuAD V1.1. The model is licensed under the Apache 2.0 license and is devel...

  • distilbert-base-uncased-finetuned-sst-2-english

    This is a fine-tuned version of DistilBERT-base-uncased, trained on SST-2, which reached 91.3 % accuracy on the dev set. Developed by Hugging Face, it's mainly intended to be used for topic classification and can be fine-tuned on downstream tasks, but it's important to keep in mind that it has ce...

  • distilroberta-base

    DistilRoBERTa base is a distilled version of the RoBERTa-base model, with 6 layers, 768 dimension, and 12 heads, and 82M parameters, it is faster than RoBERTa-base. The model is primarily intended for fine-tuning on whole sentence-based tasks such as sequence classification, token classification,...

  • facebook-bart-large-cnn

    The BART model is a transformer encoder-encoder model trained on English language data, and fine-tuned on CNN Daily Mail. It is used for text summarization and has been trained to reconstruct text that has been corrupted using an arbitrary noising function. The model is effective for text generat...

  • facebook-deit-base-patch16-224

    This model is a more efficiently trained Vision Transformer (ViT). The Vision Transformer (ViT) is a transformer encoder model that is pre-trained and fine-tuned on a large collection of images in a supervised fashion. It is presented with images as sequences of fixed-size patches, which are line...

  • facebook-sam-vit-base

    The Segment Anything Model (SAM) is an innovative image segmentation tool capable of creating high-quality object masks from simple input prompts. Trained on a massive dataset comprising 11 million images and 1.1 billion masks, SAM demonstrates strong zero-shot capabilities, effectively adapting ...

  • facebook-sam-vit-huge

    The Segment Anything Model (SAM) is an innovative image segmentation tool capable of creating high-quality object masks from simple input prompts. Trained on a massive dataset comprising 11 million images and 1.1 billion masks, SAM demonstrates strong zero-shot capabilities, effectively adapting ...

  • facebook-sam-vit-large

    The Segment Anything Model (SAM) is an innovative image segmentation tool capable of creating high-quality object masks from simple input prompts. Trained on a massive dataset comprising 11 million images and 1.1 billion masks, SAM demonstrates strong zero-shot capabilities, effectively adapting ...

  • finiteautomata-bertweet-base-sentiment-analysis

    The pysentimiento library is an open-source tool for non-commercial use and scientific research purposes, used for Sentiment Analysis and Social NLP tasks. It was trained on about 40k tweets from the SemEval 2017 corpus, using the BERTweet - a RoBERTa model trained on English tweets and processes...

  • google-vit-base-patch16-224

    The Vision Transformer (ViT) is a BERT-like transformer encoder model which is pretrained on a large collection of images in a supervised fashion, such as ImageNet-21k. The ImageNet dataset comprises 1 million images and 1000 classes at a resolution of 224x224, which the model was fine-tuned on. ...

  • Jean-Baptiste-camembert-ner

    Summary: camembert-ner is a NER model fine-tuned from camemBERT on the Wikiner-fr dataset and was validated on email/chat data. It shows better performance on entities that do not start with an uppercase. The model has four classes: O, MISC, PER, ORG and LOC. The model can be loaded using Hugging...

  • mask_rcnn_swin-t-p4-w7_fpn_1x_coco

    This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ...

  • microsoft-beit-base-patch16-224-pt22k-ft22k

    The BEiT is a vision transformer that is similar to the BERT model, but is also capable of image analysis. The model is pre-trained on a large collection of images, and uses patches to analyze images. It uses relative position embeddings and mean-pooling to classify images, and can be used to ext...

  • microsoft-deberta-base

    DeBERTa is a version of the BERT model that has been improved through the use of disentangled attention and enhanced mask decoders. It outperforms BERT and RoBERTa on a majority of NLU tasks using 80GB of training data. It has been fine-tuned on NLU tasks and has achieved dev results on SQuAD 1.1...

  • microsoft-deberta-base-mnli

    DeBERTa is a version of the BERT model that has been improved through the use of disentangled attention and enhanced mask decoders. Compared to BERT and RoBERTa, it outperforms them on a majority of NLU tasks using 80GB of training data. It has been fine-tuned for NLU tasks and has achieved dev r...

  • microsoft-deberta-large

    Decoding-enhanced BERT with Disentangled Attention is that it is an improvement of the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With 80GB training data, it outperforms the BERT and RoBERTa models in many Natural Language Understanding (NLU) tasks. Key result...

  • microsoft-deberta-large-mnli

    DeBERTa is an improvement of BERT and RoBERTa using disentangled attention and enhanced mask decoder. With 80GB training data, it outperforms BERT and RoBERTa on the majority of NLU tasks. The fine-tuned DeBERTa with MNLI task results in the best performance on SQuAD 1.1/2.0 and GLUE benchmark ta...

  • microsoft-deberta-xlarge

    DeBERTa is a model that improves on the BERT and RoBERTa models by using disentangled attention and an enhanced mask decoder. It performance better on several NLU tasks than RoBERTa with 80GB training data. The DeBERTa XLarge model has 48 layers and a hidden size of 1024 with 750 million paramete...

  • microsoft-swinv2-base-patch4-window12-192-22k

    The Swin Transformer is a type of Vision Transformer used in both image classification and dense recognition tasks. It builds hierarchical feature maps by merging image patches in deeper layers and has linear computation complexity to input image size due to computation of self-attention only wit...

  • mmd-3x-deformable-detr_refine_twostage_r50_16xb2-50e_coco

    deformable-detr_refine_twostage_r50_16xb2-50e_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/deformable_d...

  • mmd-3x-mask-rcnn_swin-t-p4-w7_fpn_1x_coco

    This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ...

  • mmd-3x-sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco

    sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/spar...

  • mmd-3x-sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco

    sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/spars...

  • mmd-3x-vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco

    vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/vfnet/metafile.yml#L46" t...

  • mmd-3x-vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco

    vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/vfnet/metafile.yml...

  • mmd-3x-yolof_r50_c5_8x8_1x_coco

    yolof_r50_c5_8x8_1x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/yolof/metafile.yml#L21" target="_blan...

  • mmeft

    Multimodal Early Fusion Transformer, MMEFT, is a transformer-based model tailored for processing both structured and unstructured data.

It can be used for multi-class and multi-label multimodal classification tasks, and is capable of handling datasets with features from diverse modes, including ...

  • ocsort_yolox_x_crowdhuman_mot17-private-half

    ocsort_yolox_x_crowdhuman_mot17-private-half model is from OpenMMLab's MMTracking library. This model is <a href="https://github.com/open-mmlab/mmtracking/blob/master/configs/mot/ocsort/metafile.yml#L24" target=...

  • OpenAI-CLIP-Image-Text-Embeddings-ViT-Base-Patch32

    The CLIP model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-B/32 Transformer architecture as an image...

  • OpenAI-CLIP-ViT-Base-Patch32

    The CLIP model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-B/32 Transformer architecture as an image...

  • OpenAI-CLIP-ViT-Large-Patch14

    The CLIP model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-L/14 Transformer architecture as an image...

  • openai-whisper-large

    Whisper is an OpenAI pre-trained speech recognition model with potential applications for ASR solutions for developers. However, due to weak supervision and large-scale noisy data, it should be used with caution in high-risk domains. The model has been trained on 680k hours of audio data represen...

  • openai-whisper-large-v3

    Whisper is a model that can recognize and translate speech using deep learning. It was trained on a large amount of data from different sources and languages. Whisper models can handle various tasks and domains without needing to adjust the model.

Whisper large-v3 is similar to the previous larg...

Clone this wiki locally