Which architectures should be included? #52

LukeWood · 2022-01-24T07:05:22Z

LukeWood
Jan 24, 2022

Hey KerasCV contributors,

I'm starting this discussion thread as a place to throw out ideas for which model architectures should/shouldn't be included in KerasCV. The answer for some architectures feel more obvious. I.e. RegNets, ResNets, WideResNet, all feel like strong includes. Some are much less obvious.

Feel free to post links, ideas, or thoughts relating to model architectures in this discussions.

bhack · 2022-01-24T11:37:55Z

bhack
Jan 24, 2022

Also if this Is probably too fresh It could be interesting to collect some strong self-supervised performer like ReLICv2

I also want to try to figure out where we will place common reusable components between Keras-cv and keras-nlp on the emerging thread of multimodal/unified models like:

https://ai.facebook.com/blog/the-first-high-performance-self-supervised-algorithm-that-works-for-speech-vision-and-text/

Or

https://blog.google/products/search/introducing-mum/

0 replies

innat · 2022-01-25T08:03:22Z

innat
Jan 25, 2022

(IMO) Some arch. that should be included are as follows (currently the following models are not included in keras.applications).

Image Classification

ResNeSt - ResNeXt - SE-Net, SE-ResNeXt - SE-ResNet - NFNets - ResNet-RS - GhostNet - MixNet - ConvNeXt
Swin-Transformer - DeiT - ViT

Object Detection

EfficientDet - Y0LO (v3, v4, R, X), Faster-RCNN, DE⫶TR, RetinaNet, SSD
Pix-Seq: A general framework for turning RGB pixels into semantically meaningful sequences

Semantic Segmentation (backbone: image classification model)

UNet, XNet, DeepLab (v3, v3++), DANet, CCNet, SegFormer, SETR, PointRend

Instance and Panoptic Segmentation

1 reply

DavidLandup0 Aug 3, 2022

Great list! keras.applications is lagging behind torchvision, so keras_cv.models would be a great contender. torchvision has a few out-of-the-box object detection classes as well, including RetinaNet and Faster R-CNN that produce results in ~25 lines of code. Their support is pretty flimsy though, and it's still in beta, so there's a clear void for OD model support in official libraries. Perhaps KerasCV would be the first one to bring OD to the masses? :)

sebastian-sz · 2022-01-25T08:12:31Z

sebastian-sz
Jan 25, 2022

Apart from standard architectures, It would be great to see some smaller architectures, that are more suitable for mobile as well:

ResNet18 and ResNet34 - these are unavailable in Keras ecosystem as far as I know.
EfficientNetLite - not mentioned in the paper, but available in original repository and via TFLite Model Maker.

2 replies

LukeWood Jan 27, 2022
Author

Yeah, we have a contributor working on getting ResNet18 added already. ResNet34, noted. I'll open an issue. This is definitely an add.

How does efficient net lite differ from those offered in Keras applications?

innat Jan 27, 2022

@LukeWood cc. @sebastian-sz
Check this page. Mainly stated:

Due to the requirements from edge devices, we mainly made the following changes based on the original EfficientNets.

Remove squeeze-and-excite (SE): SE are not well supported for some mobile accelerators.
Replace all swish with RELU6: for easier post-quantization.
Fix the stem and head while scaling models up: for keeping models small and fast.

innat · 2022-01-25T12:52:05Z

innat
Jan 25, 2022

A pointer to few OCR (text detection-recognition) models.

0 replies

innat · 2022-01-25T12:58:49Z

innat
Jan 25, 2022

@LukeWood Is there any possibility to support 3D modeling? It becomes obvious to have for example in deep-learning-in-medicine practice. FYI, there's some unofficial support, segmentation_models_3D and classification_models_3D. Here is one nice toolbox for PyTorch, MONAI.

1 reply

LukeWood Jan 27, 2022
Author

For now, probably not. It would be really cool though. We have limited resources and our first tasks we are targeting are:

image classification
object detection
image segmenetation

bhack · 2022-01-26T19:26:24Z

bhack
Jan 26, 2022

Some modern mixer style networks could be interesting:

https://github.com/locuslab/convmixer

https://discuss.tensorflow.org/t/research-mlp-mixer-an-all-mlp-architecture-for-vision/1849/7

1 reply

innat Jan 27, 2022

https://keras.io/examples/vision/convmixer/

chjort · 2022-01-27T12:51:54Z

chjort
Jan 27, 2022

It would be great to include models from each architectural pillar in deep learning CV. Examples are, but not limited to:

Convolution based architectures
Transformer based architectures
MLP based architectures.
Hybrid architectures

To begin with, I think it would be good to identify what corresponds to the ResNet for each pillar. I.e. what is the most known or widely-used model within that type of architecture, exactly like what ResNet is for convolution based architectures. I think it would be great to implement these "core" models to begin with, and then only implement other variants that has a significant level of incremental changes or performance increases.

2 replies

bhack Jan 27, 2022

I think that other then a global citation threshold we need to add the time dimension. We need to find a metric about citations in the last one or 2 years.
E.g. relu activation paper has 996 total citations, 36 in 2022), 552 in 2021 (Google scholar)

Resources are limited, however, also if we choice as a policy to scale the repo maintainership over the community and partially rely on its own codeownership/PR reviews capabilities the research in this domain has still a growing output.

I think here it is also a little bit harder as here we are not oriented just to network reproduction but to identify layers, ops/utils, losses, metrics, optimizers and any other kind of component unit that could be reused across many experiments/network flavors.

So, in theory, components that are used only in one o few model with only a marginal improvement on the performance (computational, datasets metrics or both) could potentially have the lowest cost/value ratio for a library to allocate a review and maintainership resources over time.

chjort Jan 27, 2022

I think here it is also a little bit harder as here we are not oriented just to network reproduction but to identify layers, ops/utils, losses, metrics, optimizers and any other kind of component unit that could be reused across many experiments/network flavors.

I agree that we need to decide on how sub-components of a model should be reusable and to which degree. Making a component highly generic and reusable requires well thought out abstractions, and can lead to high cognitive load for the user. This is also why the HuggingFace Transformer library specificly does not aim to be modular (See here). They instead optimize towards making each model implementation explicit and easily accessible. Personally, I would like more modularity and abstractions than what Huggingface does, but not too much either. We need to strike a balance.

So, in theory, components that are used only in one o few model with only a marginal improvement on the performance (computational, datasets metrics or both) could potentially have the lowest cost/value ratio for a library to allocate a review and maintainership resources over time.

This is definitely a metric we need to consider when deciding on the level of abstraction and modularity of a component.

innat · 2022-02-02T02:24:24Z

innat
Feb 2, 2022

@LukeWood wdyt?

Introducing TorchVision’s New Multi-Weight Support API
Discussion.

1 reply

innat Feb 3, 2022

Pretrained EfficientNet Checkpoints

innat · 2022-06-28T17:57:42Z

innat
Jun 28, 2022

If a project doesn't have peer-reviewed paper but consistently making strong impact to the community, will it be considered to add? For example, model like yolo-v5, yolo-v6, yolo-v7 doesn't have paperwork but very impactful.

(I wonder why such practice even gets accepted by the community, they should have paperwork first, IMO).

2 replies

LukeWood Jun 29, 2022
Author

If a project doesn't have peer-reviewed paper but consistently making strong impact to the community, will it be considered to add? For example, model like yolo-v5, yolo-v6, yolo-v7 doesn't have paperwork but very impactful.

(I wonder why such practice even gets accepted by the community, they should have paperwork first, IMO).

Yes absolutely.

zhiqwang Jul 14, 2022

Maybe YOLOv7 should be this one : https://github.com/WongKinYiu/yolov7, The other one does not seem to be accepted by the community.

Just IMO, YOLOv5 is great, you can find many places in YOLO {R, X} that are affected by YOLOv5.

innat · 2022-07-14T05:53:20Z

innat
Jul 14, 2022

There is a nice code repository maintaining regarding dense pixel labeling tasks from google research. It would be great if it's shifted towards here in keras_cv.

Supported methods / models:

cc. @csrhddlam @yucornetto @YknZhu

0 replies

innat · 2022-07-26T05:48:28Z

innat
Jul 26, 2022

[Info]

Towards Grand Unification of Object Tracking
https://github.com/masterbin-iiau/unicorn

0 replies

bhack · 2022-08-23T14:09:51Z

bhack
Aug 23, 2022

https://arxiv.org/abs/2208.10442

@LukeWood I think that we need to talk also with Keras-NLP on where/how to handle these models trend.

0 replies

Which architectures should be included? #52

Uh oh!

Replies: 12 comments · 10 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LukeWood Jan 27, 2022 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LukeWood Jan 27, 2022 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LukeWood Jun 29, 2022 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 12 comments 10 replies

LukeWood Jan 27, 2022
Author

LukeWood Jan 27, 2022
Author

LukeWood Jun 29, 2022
Author