models Models documentation

Models

Models in this category

AutoML-Image-Classification

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Image-Instance-Segmentation

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Image-Object-Detection

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Named-Entity-Recognition

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
AutoML-Text-Classification

Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased product...
bytetrack_yolox_x_crowdhuman_mot17-private-half

bytetrack_yolox_x_crowdhuman_mot17-private-half model is from OpenMMLab's MMTracking library. This model is <a href="https://github.com/open-mmlab/mmtracking/blob/master/configs/mot/bytetrack/metafile.yml#L24" t...
compvis-stable-diffusion-v1-4

CompVis/stable-diffusion-v1-4 is a latent text-to-image diffusion model known for generating highly realistic images from textual input. This model incorporates a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Im...
deformable_detr_twostage_refine_r50_16x2_50e_coco

deformable_detr_twostage_refine_r50_16x2_50e_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6...
distilbert-base-cased

The DistilBERT model is a smaller, faster version of the BERT model for Transformer-based language modeling with 40% fewer parameters and 60% faster run time while retaining 95% of BERT's performance on the GLUE language understanding benchmark. This English language question answering model has ...
distilbert-base-uncased

The DistilBERT base model (uncased) is a distilled version of the BERT base model that is smaller and faster than BERT. It was introduced in a specific paper and the code for creating the model can be found on a specific webpage. The model is uncased so it doesn't differentiate between lower and ...
distilbert-base-uncased-distilled-squad

The DistilBERT model is a distilled version of the BERT language model with 40% fewer parameters, 60% faster run time, but with 95% of BERT's performance. It is trained for question answering and has a F1 score of 87.1 on SQuAD V1.1. The model is licensed under the Apache 2.0 license and is devel...
distilbert-base-uncased-finetuned-sst-2-english

This is a fine-tuned version of DistilBERT-base-uncased, trained on SST-2, which reached 91.3 % accuracy on the dev set. Developed by Hugging Face, it's mainly intended to be used for topic classification and can be fine-tuned on downstream tasks, but it's important to keep in mind that it has ce...
distilroberta-base

DistilRoBERTa base is a distilled version of the RoBERTa-base model, with 6 layers, 768 dimension, and 12 heads, and 82M parameters, it is faster than RoBERTa-base. The model is primarily intended for fine-tuning on whole sentence-based tasks such as sequence classification, token classification,...
facebook-bart-large-cnn

The BART model is a transformer encoder-encoder model trained on English language data, and fine-tuned on CNN Daily Mail. It is used for text summarization and has been trained to reconstruct text that has been corrupted using an arbitrary noising function. The model is effective for text generat...
facebook-deit-base-patch16-224

This model is a more efficiently trained Vision Transformer (ViT). The Vision Transformer (ViT) is a transformer encoder model that is pre-trained and fine-tuned on a large collection of images in a supervised fashion. It is presented with images as sequences of fixed-size patches, which are line...
facebook-sam-vit-base

The Segment Anything Model (SAM) is an innovative image segmentation tool capable of creating high-quality object masks from simple input prompts. Trained on a massive dataset comprising 11 million images and 1.1 billion masks, SAM demonstrates strong zero-shot capabilities, effectively adapting ...
facebook-sam-vit-huge

The Segment Anything Model (SAM) is an innovative image segmentation tool capable of creating high-quality object masks from simple input prompts. Trained on a massive dataset comprising 11 million images and 1.1 billion masks, SAM demonstrates strong zero-shot capabilities, effectively adapting ...
facebook-sam-vit-large

The Segment Anything Model (SAM) is an innovative image segmentation tool capable of creating high-quality object masks from simple input prompts. Trained on a massive dataset comprising 11 million images and 1.1 billion masks, SAM demonstrates strong zero-shot capabilities, effectively adapting ...
finiteautomata-bertweet-base-sentiment-analysis

The pysentimiento library is an open-source tool for non-commercial use and scientific research purposes, used for Sentiment Analysis and Social NLP tasks. It was trained on about 40k tweets from the SemEval 2017 corpus, using the BERTweet - a RoBERTa model trained on English tweets and processes...
google-vit-base-patch16-224

The Vision Transformer (ViT) is a BERT-like transformer encoder model which is pretrained on a large collection of images in a supervised fashion, such as ImageNet-21k. The ImageNet dataset comprises 1 million images and 1000 classes at a resolution of 224x224, which the model was fine-tuned on. ...
Jean-Baptiste-camembert-ner

Summary: camembert-ner is a NER model fine-tuned from camemBERT on the Wikiner-fr dataset and was validated on email/chat data. It shows better performance on entities that do not start with an uppercase. The model has four classes: O, MISC, PER, ORG and LOC. The model can be loaded using Hugging...
mask_rcnn_swin-t-p4-w7_fpn_1x_coco

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ...
microsoft-beit-base-patch16-224-pt22k-ft22k

The BEiT is a vision transformer that is similar to the BERT model, but is also capable of image analysis. The model is pre-trained on a large collection of images, and uses patches to analyze images. It uses relative position embeddings and mean-pooling to classify images, and can be used to ext...
microsoft-deberta-base

DeBERTa is a version of the BERT model that has been improved through the use of disentangled attention and enhanced mask decoders. It outperforms BERT and RoBERTa on a majority of NLU tasks using 80GB of training data. It has been fine-tuned on NLU tasks and has achieved dev results on SQuAD 1.1...
microsoft-deberta-base-mnli

DeBERTa is a version of the BERT model that has been improved through the use of disentangled attention and enhanced mask decoders. Compared to BERT and RoBERTa, it outperforms them on a majority of NLU tasks using 80GB of training data. It has been fine-tuned for NLU tasks and has achieved dev r...
microsoft-deberta-large

Decoding-enhanced BERT with Disentangled Attention is that it is an improvement of the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With 80GB training data, it outperforms the BERT and RoBERTa models in many Natural Language Understanding (NLU) tasks. Key result...
microsoft-deberta-large-mnli

DeBERTa is an improvement of BERT and RoBERTa using disentangled attention and enhanced mask decoder. With 80GB training data, it outperforms BERT and RoBERTa on the majority of NLU tasks. The fine-tuned DeBERTa with MNLI task results in the best performance on SQuAD 1.1/2.0 and GLUE benchmark ta...
microsoft-deberta-xlarge

DeBERTa is a model that improves on the BERT and RoBERTa models by using disentangled attention and an enhanced mask decoder. It performance better on several NLU tasks than RoBERTa with 80GB training data. The DeBERTa XLarge model has 48 layers and a hidden size of 1024 with 750 million paramete...
microsoft-swinv2-base-patch4-window12-192-22k

The Swin Transformer is a type of Vision Transformer used in both image classification and dense recognition tasks. It builds hierarchical feature maps by merging image patches in deeper layers and has linear computation complexity to input image size due to computation of self-attention only wit...
mmd-3x-deformable-detr_refine_twostage_r50_16xb2-50e_coco

deformable-detr_refine_twostage_r50_16xb2-50e_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/deformable_d...
mmd-3x-mask-rcnn_swin-t-p4-w7_fpn_1x_coco

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ...
mmd-3x-sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco

sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/spar...
mmd-3x-sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco

sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/spars...
mmd-3x-vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco

vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/vfnet/metafile.yml#L46" t...
mmd-3x-vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco

vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/vfnet/metafile.yml...
mmd-3x-yolof_r50_c5_8x8_1x_coco

yolof_r50_c5_8x8_1x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/3.x/configs/yolof/metafile.yml#L21" target="_blan...
mmeft

Multimodal Early Fusion Transformer, MMEFT, is a transformer-based model tailored for processing both structured and unstructured data.

It can be used for multi-class and multi-label multimodal classification tasks, and is capable of handling datasets with features from diverse modes, including ...

ocsort_yolox_x_crowdhuman_mot17-private-half

ocsort_yolox_x_crowdhuman_mot17-private-half model is from OpenMMLab's MMTracking library. This model is <a href="https://github.com/open-mmlab/mmtracking/blob/master/configs/mot/ocsort/metafile.yml#L24" target=...
OpenAI-CLIP-Image-Text-Embeddings-ViT-Base-Patch32

The CLIP model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-B/32 Transformer architecture as an image...
OpenAI-CLIP-ViT-Base-Patch32

The CLIP model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-B/32 Transformer architecture as an image...
OpenAI-CLIP-ViT-Large-Patch14

The CLIP model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-L/14 Transformer architecture as an image...
openai-whisper-large

Whisper is an OpenAI pre-trained speech recognition model with potential applications for ASR solutions for developers. However, due to weak supervision and large-scale noisy data, it should be used with caution in high-risk domains. The model has been trained on 680k hours of audio data represen...
openai-whisper-large-v3

Whisper is a model that can recognize and translate speech using deep learning. It was trained on a large amount of data from different sources and languages. Whisper models can handle various tasks and domains without needing to adjust the model.

Whisper large-v3 is similar to the previous larg...

roberta-base-openai-detector

RoBERTa Base OpenAI Detector is a language model developed by OpenAI that is fine-tuned using outputs from the 1.5B GPT-2 model. It is designed to detect text generated by GPT-2 and is not meant to be used for malicious purposes or to evade detection. The main focus of the model is to aid in synt...
roberta-large-mnli

Roberta-large-MNLI is a fine-tuned version of the RoBERTa large model on the Multi-Genre Natural Language Inference (MNLI) corpus. It is a transformer-based language model for English. The model is developed on GitHub Repo by some developers, also licensed under MIT. The fine-tuned model can be u...
roberta-large-openai-detector

RoBERTa Large OpenAI Detector is a fine-tuned transformer-based language model developed by OpenAI to detect text generated by GPT-2 models. The model has an accuracy of approximately 95% for detecting 1.5B GPT-2-generated text, but the developers note that accuracy may decrease as model sizes in...
runwayml-stable-diffusion-inpainting

runwayml/stable-diffusion-inpainting is a versatile text-to-image model capable of producing realistic images from text input and performing inpainting using masks. It was initialized with Stable-Diffusion-v-1-2 weights and underwent two training phases: 595k steps of regular training and 4...
runwayml-stable-diffusion-v1-5

runwayml/stable-diffusion-v1-5 is a powerful text-to-image latent diffusion model capable of generating photo-realistic images given any text input. The model uses a fixed pretrained text encoder (CLIP ViT-L/14) as suggested in the ...
Salesforce-BLIP-2-opt-2-7b-image-to-text

BLIP-2 is a model consisting of three components: a CLIP-like image encoder, a Querying Transformer (Q-Former), and a large language model. The image encoder and language model are initialized from pre-trained checkpoints and kept frozen while training the Querying Transformer. The model's goal...
Salesforce-BLIP-2-opt-2-7b-vqa

BLIP-2 is a model consisting of three components: a CLIP-like image encoder, a Querying Transformer (Q-Former), and a large language model. The image encoder and language model are initialized from pre-trained checkpoints and kept frozen while training the Querying Transformer. The model's goal...
Salesforce-BLIP-image-captioning-base

The BLIP framework is a new Vision-Language Pre-training (VLP) framework that can be used for both vision-language understanding and generation tasks. BLIP effectively utilizes noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the ...
Salesforce-BLIP-vqa-base

BLIP is a new Vision-Language Pre-training (VLP) framework that excels in both understanding-based and generation-based tasks. It effectively utilizes noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. BLIP achieves ...
sparse_rcnn_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco

sparse_rcnn_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d078...
sparse_rcnn_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco

sparse_rcnn_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787...
sshleifer-distilbart-cnn-12-6

The RoBERTa Large model is a large transformer-based language model that was developed by the Hugging Face team. It is pre-trained on masked language modeling and can be used for tasks such as sequence classification, token classification, or question answering. Its primary usage is as a fine-tun...
stabilityai-stable-diffusion-2-1

stabilityai/stable-diffusion-2-1 model is a fine-tuned version of the Stable Diffusion v2 model, with additional training steps on the same dataset. It's designed for generating and modifying images based on text prompts, utilizing a Latent Diffusion Model with a fixed, pretrained text encode...
stabilityai-stable-diffusion-2-inpainting

stabilityai/stable-diffusion-2-inpainting model is a continuation of the stable-diffusion-2-base model, with an additional 200,000 steps of training. It utilizes a mask-generation strategy introduced in LAMA and combines this with latent Variational Autoencoder (VAE) representations of the ...
stabilityai-stable-diffusion-xl-refiner-1-0

stabilityai/stable-diffusion-xl-refiner-1.0 employs an ensemble of expert modules in a pipeline for latent diffusion. The process involves using a base model to generate noisy latents, which are then refined using a specialized denoising model. The base model can function independently. Alt...
vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco

vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6061f92f...
vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco

vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6...
yolof_r50_c5_8x8_1x_coco

yolof_r50_c5_8x8_1x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6061f92fa09e48fb1/configs/...

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

models Models documentation

Models

Models in this category

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!