Skip to content
Merged
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 34 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,36 +4,28 @@
![Tensorflow](https://img.shields.io/badge/tensorflow-v2.9.0+-success.svg)
[![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/keras-team/keras-cv/issues)

# Vision
A computer vision library dedicated for auto-driving, robotics and on device applications.

# Mission

KerasCV is a layered repository consisting of core components and modeling components.
KerasCV is a computer vision library of modular computer vision oriented Keras components.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cool if we could align the first sentences of KerasCV's and KerasNLP's mission. For KerasNLP we have -> "KerasNLP is a natural language processing library that supports users through their entire development cycle." Def open to changing that if we find some language we think will be exciting for readers and applicable for both projects.

cc @jbischof

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to feedback. I think the CV selling point is a bit different in that its based on modularity - but open to feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modular only makes our second sentence currently :P, but I would be totally down to bump it up. Totally fine to leave both as is if this is feeling too tricky, was just thinking it might be a good way to show some alignment between the packages.

These components consist of models, layers, metrics, losses, callbacks, and utility functions.

On the core components, it is made of modular building blocks (ops, functions, layers, metrics, losses, callbacks) that standardizes APIs for computer vision concepts such as data-augmentation pipeline, bounding boxes, keypoints, point clouds, feature pyramid network, etc, so applied computer vision engineers can leverage to quickly assemble production-grade, state-of-the-art
training and inference pipelines for common tasks such as image classification, object detection and segmentation, image data augmentation, etc.
The goal of the library is to provide standardized Keras native APIs for common computer vision tasks such as data-augmentation, classification, object detection, image generation, and more.
Applied computer vision engineers can leverage KerasCV to quickly assemble production-grade, state-of-the-art training and inference pipelines for all of these common tasks.

On the modeling components, it provides the most widely used models for each task such as ResNet family, MobileNet family, transformer-based models, anchor-based and anchor-free meta architectures, unet models, that are built on top of core components, highly composable and compatible with the Keras trainer (`model.fit`). It aims to provide pre-built models that are mixed-precision compatible, QAT compatible, and xla compilable during training, and generic model optimization tools for deployment on devices such as onboard GPUs, mobile, edge chips.

KerasCV provides the following values for users:
- modular mid-level APIs and composable meta architectures
- mixed-precision and xla enabled components
- highly optimized, quantization aware training (QAT) enabled models, compatible between GPUs and TPUs.
- reproducible training results and leaderboard
- useful tools for evaluation, visualization and explanation.
- source for inference conversion (TFLite, edge devices, TensorRT, etc) and optimization at model level.

KerasCV can be understood as a horizontal extension of the Keras API: the components are new first-party
Keras objects (layers, metrics, etc) that are too specialized to be added to core Keras, but that receive
the same level of polish and backwards compatibility guarantees as the rest of the Keras API and that
are maintained by the Keras team itself.

In addition to API consistency, KerasCV components aim to be mixed-precision compatible, QAT compatible, xla compilable, and TPU compatible.
In the near term, we aim to provide pre-trained models for common tasks such as on-device object detection and NSFW classification.
We also aim to provide generic model optimization tools for deployment on devices such as onboard GPUs, mobile, edge chips.

KerasCV's primary goal is to provide a coherent, elegant, and pleasant API to train state of the art computer vision models.
Users should be able to train state of the art models using only `Keras`, `KerasCV`, and TensorFlow core (i.e. `tf.data`) components.

Different from Keras IO, this product focus on meta architectures and training scripts to help users reproduce result from open datasets.

To learn more about the future project direction, please check the [roadmap](.github/ROADMAP.md).

## Quick Links
Expand All @@ -52,7 +44,7 @@ but also for active development for feature delivery. To achieve this, here is t
process for how to contribute to this repository:

1) Contributors are always welcome to help us fix an issue, add tests, better documentation.
2) If contributors would like to create a backbone, we usually require a pre-trained weight
2) If contributors would like to create a backbone, we usually require a pre-trained weight set
with the model for one dataset as the first PR, and a training script as a follow-up. The training script will preferrably help us reproduce the results claimed from paper. The backbone should be generic but the training script can contain paper specific parameters such as learning rate schedules and weight decays. The training script will be used to produce leaderboard results.
Exceptions apply to large transformer-based models which are difficult to train. If this is the case,
contributors should let us know so the team can help in training the model or providing GCP resources.
Expand All @@ -67,14 +59,27 @@ Thank you to all of our wonderful contributors!
</a>

## Pretrained Weights
Many models in KerasCV come with pre-trained weights. With the exception of StableDiffusion,
all of these weights are trained using Keras and KerasCV components and training scripts in this
repository. Models may not be trained with the same parameters or preprocessing pipeline
described in their original papers. Performance metrics for pre-trained weights can be found
in the training history for each task. For example, see ImageNet classification training
history for backbone models [here](examples/training/classification/imagenet/training_history.json).
All results are reproducible using the training scripts in this repository. Pre-trained weights
operate on images that have been rescaled using a simple `1/255` rescaling layer.
Many models in KerasCV come with pre-trained weights.
With the exception of StableDiffusion, all of these weights are trained using Keras and
KerasCV components and training scripts in this repository.
While some models are not be trained with the same parameters or preprocessing pipeline
as defined in their original publications, KerasCV still ensuresstrong performance.
Performance metrics for the provided pre-trained weights can be found
in the training history for each documented task.
An example of this can be found in the ImageNet classification training
[history for backbone models](examples/training/classification/imagenet/training_history.json).
All results are reproducible using the training scripts in this repository.

Historically, many models have been trained on image datasets rescaled via manually
crafted normalization schemes.
The most common variant of manually crafted normalization scheme is subtraction of the
imagenet mean pixel followed by standard deviation normalization based on the imagenet
pixel standard deviation.
This scheme is an artifact of the days of manual feature engineering, but is no longer
required to score state of the art scores using modern deep learning architectures.
Due to this, KerasCV is standardized to operate on images that have been rescaled using
a simple `1/255` rescaling layer.
This can be seen in all KerasCV training pipelines and code examples.

## Custom Ops
Note that in some the 3D Object Detection layers, custom TF ops are used. The
Expand All @@ -85,8 +90,8 @@ If you'd like to use these custom ops, you can install from source using the
instructions below.

### Installing KerasCV with Custom Ops from Source
Installing from source requires the [Bazel](https://bazel.build/) build system
(version >= 5.4.0).
Installing custom ops from source requires the [Bazel](https://bazel.build/) build
system (version >= 5.4.0).

```
git clone https://github.com/keras-team/keras-cv.git
Expand All @@ -111,7 +116,8 @@ and Windows.
KerasCV provides access to pre-trained models via the `keras_cv.models` API.
These pre-trained models are provided on an "as is" basis, without warranties
or conditions of any kind.
The following underlying models are provided by third parties, and subject to separate licenses:
The following underlying models are provided by third parties, and subject to separate
licenses:
StableDiffusion

## Citing KerasCV
Expand Down