You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,12 +41,12 @@ This collection of models take images as input, then classifies the major object
41
41
|<b>[GoogleNet](vision/classification/inception_and_googlenet/googlenet)</b>|[Szegedy et al.](https://arxiv.org/pdf/1409.4842.pdf)|Deep CNN model(up to 22 layers). Comparatively smaller and faster than VGG and more accurate in detailing than AlexNet. <br> Top-5 error from paper - ~6.7%|[](https://huggingface.co/spaces/onnx/GoogleNet)|
42
42
|<b>[CaffeNet](vision/classification/caffenet)</b>|[Krizhevsky et al.](https://ucb-icsi-vision-group.github.io/caffe-paper/caffe.pdf)|Deep CNN variation of AlexNet for Image Classification in Caffe where the max pooling precedes the local response normalization (LRN) so that the LRN takes less compute and memory.|[](https://huggingface.co/spaces/onnx/CaffeNet)|
43
43
|<b>[RCNN_ILSVRC13](vision/classification/rcnn_ilsvrc13)</b>|[Girshick et al.](https://arxiv.org/abs/1311.2524)|Pure Caffe implementation of R-CNN for image classification. This model uses localization of regions to classify and extract features from images.|
44
-
|<b>[DenseNet-121](vision/classification/densenet-121)</b>|[Huang et al.](https://arxiv.org/abs/1608.06993)|Model that has every layer connected to every other layer and passes on its own feature providing strong gradient flow and more diversified features.|
44
+
|<b>[DenseNet-121](vision/classification/densenet-121)</b>|[Huang et al.](https://arxiv.org/abs/1608.06993)|Model that has every layer connected to every other layer and passes on its own feature providing strong gradient flow and more diversified features.|[](https://huggingface.co/spaces/onnx/DenseNet-121)|
45
45
|<b>[Inception_V1](vision/classification/inception_and_googlenet/inception_v1)</b>|[Szegedy et al.](https://arxiv.org/abs/1409.4842)|This model is same as GoogLeNet, implemented through Caffe2 that has improved utilization of the computing resources inside the network and helps with the vanishing gradient problem. <br> Top-5 error from paper - ~6.7%|[](https://huggingface.co/spaces/onnx/Inception_v1)|
46
46
|<b>[Inception_V2](vision/classification/inception_and_googlenet/inception_v2)</b>|[Szegedy et al.](https://arxiv.org/abs/1512.00567)|Deep CNN model for Image Classification as an adaptation to Inception v1 with batch normalization. This model has reduced computational cost and improved image resolution compared to Inception v1. <br> Top-5 error from paper ~4.82%|
47
47
|<b>[ShuffleNet_V1](vision/classification/shufflenet)</b>|[Zhang et al.](https://arxiv.org/abs/1707.01083)|Extremely computation efficient CNN model that is designed specifically for mobile devices. This model greatly reduces the computational cost and provides a ~13x speedup over AlexNet on ARM-based mobile devices. Compared to MobileNet, ShuffleNet achieves superior performance by a significant margin due to it's efficient structure. <br> Top-1 error from paper - ~32.6%|
48
48
|<b>[ShuffleNet_V2](vision/classification/shufflenet)</b>|[Zhang et al.](https://arxiv.org/abs/1807.11164)|Extremely computation efficient CNN model that is designed specifically for mobile devices. This network architecture design considers direct metric such as speed, instead of indirect metric like FLOP. <br> Top-1 error from paper - ~30.6%|
49
-
|<b>[ZFNet-512](vision/classification/zfnet-512)</b>|[Zeiler et al.](https://arxiv.org/abs/1311.2901)|Deep CNN model (up to 8 layers) that increased the number of features that the network is capable of detecting that helps to pick image features at a finer level of resolution. <br> Top-5 error from paper - ~14.3%|
49
+
|<b>[ZFNet-512](vision/classification/zfnet-512)</b>|[Zeiler et al.](https://arxiv.org/abs/1311.2901)|Deep CNN model (up to 8 layers) that increased the number of features that the network is capable of detecting that helps to pick image features at a finer level of resolution. <br> Top-5 error from paper - ~14.3%|[](https://huggingface.co/spaces/onnx/ZFNet-512)|
50
50
|<b>[EfficientNet-Lite4](vision/classification/efficientnet-lite4)</b>|[Tan et al.](https://arxiv.org/abs/1905.11946)|CNN model with an order of magnitude of few computations and parameters, while still acheiving state-of-the-art accuracy and better efficiency than previous ConvNets. <br> Top-5 error from paper - ~2.9%|[](https://huggingface.co/spaces/onnx/EfficientNet-Lite4)|
51
51
<hr>
52
52
@@ -80,21 +80,21 @@ Object detection models detect the presence of multiple objects in an image and
80
80
### Body, Face & Gesture Analysis <aname="body_analysis"/>
81
81
Face detection models identify and/or recognize human faces and emotions in given images. Body and Gesture Analysis models identify gender and age in given image.
82
82
83
-
|Model Class |Reference |Description |
84
-
|-|-|-|
85
-
|<b>[ArcFace](vision/body_analysis/arcface)</b>|[Deng et al.](https://arxiv.org/abs/1801.07698)|A CNN based model for face recognition which learns discriminative features of faces and produces embeddings for input face images.|
86
-
|<b>[UltraFace](vision/body_analysis/ultraface)</b>|[Ultra-lightweight face detection model](https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB)|This model is a lightweight facedetection model designed for edge computing devices.|
83
+
|Model Class |Reference |Description |Hugging Face Spaces |
84
+
|-|-|-|-|
85
+
|<b>[ArcFace](vision/body_analysis/arcface)</b>|[Deng et al.](https://arxiv.org/abs/1801.07698)|A CNN based model for face recognition which learns discriminative features of faces and produces embeddings for input face images.|[](https://huggingface.co/spaces/onnx/ArcFace)|
86
+
|<b>[UltraFace](vision/body_analysis/ultraface)</b>|[Ultra-lightweight face detection model](https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB)|This model is a lightweight facedetection model designed for edge computing devices.|[](https://huggingface.co/spaces/onnx/ultraface)|
87
87
|<b>[Emotion FerPlus](vision/body_analysis/emotion_ferplus)</b> |[Barsoum et al.](https://arxiv.org/abs/1608.01041)| Deep CNN for emotion recognition trained on images of faces.|
88
88
|<b>[Age and Gender Classification using Convolutional Neural Networks](vision/body_analysis/age_gender)</b>|[Rothe et al.](https://data.vision.ee.ethz.ch/cvl/publications/papers/proceedings/eth_biwi_01229.pdf)|This model accurately classifies gender and age even the amount of learning data is limited.|
Image manipulation models use neural networks to transform input images to modified output images. Some popular models in this category involve style transfer or enhancing images by increasing resolution.
93
93
94
-
|Model Class |Reference |Description |
95
-
|-|-|-|
94
+
|Model Class |Reference |Description |Hugging Face Spaces |
95
+
|-|-|-|-|
96
96
|Unpaired Image to Image Translation using Cycle consistent Adversarial Network|[Zhu et al.](https://arxiv.org/abs/1703.10593)|The model uses learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. <br>[contribute](contribute.md)|
97
-
|<b>[Super Resolution with sub-pixel CNN](vision/super_resolution/sub_pixel_cnn_2016)</b> |[Shi et al.](https://arxiv.org/abs/1609.05158)|A deep CNN that uses sub-pixel convolution layers to upscale the input image. |
97
+
|<b>[Super Resolution with sub-pixel CNN](vision/super_resolution/sub_pixel_cnn_2016)</b> |[Shi et al.](https://arxiv.org/abs/1609.05158)|A deep CNN that uses sub-pixel convolution layers to upscale the input image. |[](https://huggingface.co/spaces/onnx/sub_pixel_cnn_2016)|
98
98
|<b>[Fast Neural Style Transfer](vision/style_transfer/fast_neural_style)</b> |[Johnson et al.](https://arxiv.org/abs/1603.08155)|This method uses a loss network pretrained for image classification to define perceptual loss functions that measure perceptual differences in content and style between images. The loss network remains fixed during the training process.|
99
99
<hr>
100
100
@@ -111,13 +111,13 @@ This class of models uses audio data to train models that can identify voice, ge
This subset of natural language processing models that answer questions about a given context paragraph.
113
113
114
-
|Model Class |Reference |Description |
115
-
|-|-|-|
116
-
|<b>[Bidirectional Attention Flow](text/machine_comprehension/bidirectional_attention_flow)</b>|[Seo et al.](https://arxiv.org/pdf/1611.01603)|A model that answers a query about a given context paragraph.|
117
-
|<b>[BERT-Squad](text/machine_comprehension/bert-squad)</b>|[Devlin et al.](https://arxiv.org/pdf/1810.04805.pdf)|This model answers questions based on the context of the given input paragraph. |
118
-
|<b>[RoBERTa](text/machine_comprehension/roberta)</b>|[Liu et al.](https://arxiv.org/pdf/1907.11692.pdf)|A large transformer-based model that predicts sentiment based on given input text.|
119
-
|<b>[GPT-2](text/machine_comprehension/gpt-2)</b>|[Radford et al.](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)|A large transformer-based language model that given a sequence of words within some text, predicts the next word. |
120
-
|<b>[T5](text/machine_comprehension/t5)</b>|[Raffel et al.](https://arxiv.org/abs/1910.10683)|A large transformer-based language model trained on multiple tasks at once to achieve better semantic understanding of the prompt, capable of sentiment-analysis, question-answering, similarity-detection, translation, summarization, etc. |
114
+
|Model Class |Reference |Description |Hugging Face Spaces|
115
+
|-|-|-|-|
116
+
|<b>[Bidirectional Attention Flow](text/machine_comprehension/bidirectional_attention_flow)</b>|[Seo et al.](https://arxiv.org/pdf/1611.01603)|A model that answers a query about a given context paragraph.|[](https://huggingface.co/spaces/onnx/BiDAF)|
117
+
|<b>[BERT-Squad](text/machine_comprehension/bert-squad)</b>|[Devlin et al.](https://arxiv.org/pdf/1810.04805.pdf)|This model answers questions based on the context of the given input paragraph. |[](https://huggingface.co/spaces/onnx/BERT-Squad)|
118
+
|<b>[RoBERTa](text/machine_comprehension/roberta)</b>|[Liu et al.](https://arxiv.org/pdf/1907.11692.pdf)|A large transformer-based model that predicts sentiment based on given input text.|[](https://huggingface.co/spaces/onnx/RoBERTa)|
119
+
|<b>[GPT-2](text/machine_comprehension/gpt-2)</b>|[Radford et al.](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)|A large transformer-based language model that given a sequence of words within some text, predicts the next word. |[](https://huggingface.co/spaces/onnx/GPT-2)|
120
+
|<b>[T5](text/machine_comprehension/t5)</b>|[Raffel et al.](https://arxiv.org/abs/1910.10683)|A large transformer-based language model trained on multiple tasks at once to achieve better semantic understanding of the prompt, capable of sentiment-analysis, question-answering, similarity-detection, translation, summarization, etc. |[](https://huggingface.co/spaces/onnx/T5)|
0 commit comments