handle different drop rates for EfficientNet, add timm-tf_efficientnet_lite0-lite4 (#314)

JulienMaille · web-flow · commit 04f59dbed884 · 2020-12-17T14:00:50.000+03:00
* handle different drop rates for EfficientNet, add timm-tf_efficientnet_lite0-lite4 its not clear which dataset lite0-4 were pretrained with, I set it to 'imagenet' but I've noticed the mean=(0.5, 0.5, 0.5) * readme * minor typos * correct weight from https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite * fix lite3/4 models * Update encoders.rst
diff --git a/README.md b/README.md
@@ -10,9 +10,9 @@ Segmentation based on [PyTorch](https://pytorch.org/).**
 
 The main features of this library are:
 
- - High level API (just two lines to create neural network)
+ - High level API (just two lines to create a neural network)
  - 9 models architectures for binary and multi class segmentation (including legendary Unet)
- - 99 available encoders
+ - 104 available encoders
  - All encoders have pre-trained weights for faster and better convergence
  
 ### [📚 Project Documentation 📚](http://smp.readthedocs.io/)
@@ -23,7 +23,7 @@ Visit [Read The Docs Project Page](https://smp.readthedocs.io/) or read followin
  1. [Quick start](#start)
  2. [Examples](#examples)
  3. [Models](#models)
-    1. [Architectures](#architectires)
+    1. [Architectures](#architectures)
     2. [Encoders](#encoders)
  4. [Models API](#api)
     1. [Input channels](#input-channels)
@@ -46,13 +46,13 @@ import segmentation_models_pytorch as smp
 
 model = smp.Unet(
     encoder_name="resnet34",        # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
-    encoder_weights="imagenet",     # use `imagenet` pretreined weights for encoder initialization
+    encoder_weights="imagenet",     # use `imagenet` pretrained weights for encoder initialization
     in_channels=1,                  # model input channels (1 for grayscale images, 3 for RGB, etc.)
     classes=3,                      # model output channels (number of classes in your dataset)
 )
 ```
  - see [table](#architectires) with available model architectures
- - see [table](#encoders) with avaliable encoders and its corresponding weights
+ - see [table](#encoders) with available encoders and their corresponding weights
 
 #### 2. Configure data preprocessing
 
@@ -73,7 +73,7 @@ Congratulations! You are done! Now you can train your model with your favorite f
 
 ### 📦 Models <a name="models"></a>
 
-#### Architectures <a name="architectires"></a>
+#### Architectures <a name="architectures"></a>
  - Unet [[paper](https://arxiv.org/abs/1505.04597)] [[docs](https://smp.readthedocs.io/en/latest/models.html#unet)]
  - Unet++ [[paper](https://arxiv.org/pdf/1807.10165.pdf)] [[docs](https://smp.readthedocs.io/en/latest/models.html#id2)]
  - MAnet [[paper](https://ieeexplore.ieee.org/abstract/document/9201310)] [[docs](https://smp.readthedocs.io/en/latest/models.html#manet)]
@@ -268,6 +268,11 @@ The following is a list of supported encoders in the SMP. Select the appropriate
 |timm-efficientnet-b7            |imagenet / advprop / noisy-student|63M                             |
 |timm-efficientnet-b8            |imagenet / advprop             |84M                             |
 |timm-efficientnet-l2            |noisy-student                   |474M                            |
+|timm-efficientnet-lite0         |imagenet                        |4M                              |
+|timm-efficientnet-lite1         |imagenet                        |5M                              |
+|timm-efficientnet-lite2         |imagenet                        |6M                              |
+|timm-efficientnet-lite3         |imagenet                        |8M                             |
+|timm-efficientnet-lite4         |imagenet                        |13M                             |
 
 </div>
 </details>
@@ -330,7 +335,7 @@ The following is a list of supported encoders in the SMP. Select the appropriate
  - `model.forward(x)` - sequentially pass `x` through model\`s encoder, decoder and segmentation head (and classification head if specified)
 
 ##### Input channels
-Input channels parameter allow you to create models, which process tensors with arbitrary number of channels.
+Input channels parameter allows you to create models, which process tensors with arbitrary number of channels.
 If you use pretrained weights from imagenet - weights of first convolution will be reused for
 1- or 2- channels inputs, for input channels > 4 weights of first convolution will be initialized randomly.
 ```python
@@ -340,9 +345,9 @@ mask = model(torch.ones([1, 1, 64, 64]))
 
 ##### Auxiliary classification output  
 All models support `aux_params` parameters, which is default set to `None`. 
-If `aux_params = None` than classification auxiliary output is not created, else
+If `aux_params = None` then classification auxiliary output is not created, else
 model produce not only `mask`, but also `label` output with shape `NC`.
-Classification head consist of GlobalPooling->Dropout(optional)->Linear->Activation(optional) layers, which can be 
+Classification head consists of GlobalPooling->Dropout(optional)->Linear->Activation(optional) layers, which can be 
 configured by `aux_params` as follows:
 ```python
 aux_params=dict(
@@ -357,7 +362,7 @@ mask, label = model(x)
 
 ##### Depth
 Depth parameter specify a number of downsampling operations in encoder, so you can make
-your model lighted if specify smaller `depth`.
+your model lighter if specify smaller `depth`.
 ```python
 model = smp.Unet('resnet34', encoder_depth=4)
 ```
diff --git a/docs/encoders.rst b/docs/encoders.rst
@@ -238,6 +238,16 @@ EfficientNet
 +------------------------+--------------------------------------+-------------+
 | timm-efficientnet-l2   | noisy-student                        | 474M        |
 +------------------------+--------------------------------------+-------------+
+| timm-efficientnet-lite0| imagenet                             | 4M          |
++------------------------+--------------------------------------+-------------+
+| timm-efficientnet-lite1| imagenet                             | 4M          |
++------------------------+--------------------------------------+-------------+
+| timm-efficientnet-lite2| imagenet                             | 6M          |
++------------------------+--------------------------------------+-------------+
+| timm-efficientnet-lite3| imagenet                             | 8M          |
++------------------------+--------------------------------------+-------------+
+| timm-efficientnet-lite4| imagenet                             | 13M         |
++------------------------+--------------------------------------+-------------+
 
 MobileNet
 ~~~~~~~~~
diff --git a/segmentation_models_pytorch/encoders/timm_efficientnet.py b/segmentation_models_pytorch/encoders/timm_efficientnet.py
@@ -8,7 +8,7 @@
 from ._base import EncoderMixin
 
 
-def get_efficientnet_kwargs(channel_multiplier=1.0, depth_multiplier=1.0):
+def get_efficientnet_kwargs(channel_multiplier=1.0, depth_multiplier=1.0, drop_rate=0.2):
     """Creates an EfficientNet model.
     Ref impl: https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/efficientnet_model.py
     Paper: https://arxiv.org/abs/1905.11946
@@ -44,24 +44,62 @@ def get_efficientnet_kwargs(channel_multiplier=1.0, depth_multiplier=1.0):
         channel_multiplier=channel_multiplier,
         act_layer=Swish,
         norm_kwargs={},  # TODO: check
-        drop_rate=0.2,
+        drop_rate=drop_rate,
         drop_path_rate=0.2,
     )
     return model_kwargs
 
+def gen_efficientnet_lite_kwargs(channel_multiplier=1.0, depth_multiplier=1.0, drop_rate=0.2):
+    """Creates an EfficientNet-Lite model.
 
-class EfficientNetEncoder(EfficientNet, EncoderMixin):
+    Ref impl: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite
+    Paper: https://arxiv.org/abs/1905.11946
+
+    EfficientNet params
+    name: (channel_multiplier, depth_multiplier, resolution, dropout_rate)
+      'efficientnet-lite0': (1.0, 1.0, 224, 0.2),
+      'efficientnet-lite1': (1.0, 1.1, 240, 0.2),
+      'efficientnet-lite2': (1.1, 1.2, 260, 0.3),
+      'efficientnet-lite3': (1.2, 1.4, 280, 0.3),
+      'efficientnet-lite4': (1.4, 1.8, 300, 0.3),
+
+    Args:
+      channel_multiplier: multiplier to number of channels per layer
+      depth_multiplier: multiplier to number of repeats per stage
+    """
+    arch_def = [
+        ['ds_r1_k3_s1_e1_c16'],
+        ['ir_r2_k3_s2_e6_c24'],
+        ['ir_r2_k5_s2_e6_c40'],
+        ['ir_r3_k3_s2_e6_c80'],
+        ['ir_r3_k5_s1_e6_c112'],
+        ['ir_r4_k5_s2_e6_c192'],
+        ['ir_r1_k3_s1_e6_c320'],
+    ]
+    model_kwargs = dict(
+        block_args=decode_arch_def(arch_def, depth_multiplier, fix_first_last=True),
+        num_features=1280,
+        stem_size=32,
+        fix_stem=True,
+        channel_multiplier=channel_multiplier,
+        act_layer=nn.ReLU6,
+        norm_kwargs={},
+        drop_rate=drop_rate,
+        drop_path_rate=0.2,
+    )
+    return model_kwargs
+
+class EfficientNetBaseEncoder(EfficientNet, EncoderMixin):
 
-    def __init__(self, stage_idxs, out_channels, depth=5, channel_multiplier=1.0, depth_multiplier=1.0):
-            kwargs = get_efficientnet_kwargs(channel_multiplier, depth_multiplier)
-            super().__init__(**kwargs)
+    def __init__(self, stage_idxs, out_channels, depth=5, **kwargs):
+        super().__init__(**kwargs)
 
-            self._stage_idxs = stage_idxs
-            self._out_channels = out_channels
-            self._depth = depth
-            self._in_channels = 3
+        self._stage_idxs = stage_idxs
+        self._out_channels = out_channels
+        self._depth = depth
+        self._in_channels = 3
 
-            del self.classifier
+        del self.classifier
 
     def get_stages(self):
         return [
@@ -89,6 +127,20 @@ def load_state_dict(self, state_dict, **kwargs):
         super().load_state_dict(state_dict, **kwargs)
 
 
+class EfficientNetEncoder(EfficientNetBaseEncoder):
+
+    def __init__(self, stage_idxs, out_channels, depth=5, channel_multiplier=1.0, depth_multiplier=1.0, drop_rate=0.2):
+        kwargs = get_efficientnet_kwargs(channel_multiplier, depth_multiplier, drop_rate)
+        super().__init__(stage_idxs, out_channels, depth, **kwargs)
+
+
+class EfficientNetLiteEncoder(EfficientNetBaseEncoder):
+
+    def __init__(self, stage_idxs, out_channels, depth=5, channel_multiplier=1.0, depth_multiplier=1.0, drop_rate=0.2):
+        kwargs = gen_efficientnet_lite_kwargs(channel_multiplier, depth_multiplier, drop_rate)
+        super().__init__(stage_idxs, out_channels, depth, **kwargs)
+
+
 def prepare_settings(settings):
     return {
         "mean": settings["mean"],
@@ -113,6 +165,7 @@ def prepare_settings(settings):
             "stage_idxs": (2, 3, 5),
             "channel_multiplier": 1.0,
             "depth_multiplier": 1.0,
+            "drop_rate": 0.2,
         },
     },
 
@@ -128,6 +181,7 @@ def prepare_settings(settings):
             "stage_idxs": (2, 3, 5),
             "channel_multiplier": 1.0,
             "depth_multiplier": 1.1,
+            "drop_rate": 0.2,
         },
     },
 
@@ -143,6 +197,7 @@ def prepare_settings(settings):
             "stage_idxs": (2, 3, 5),
             "channel_multiplier": 1.1,
             "depth_multiplier": 1.2,
+            "drop_rate": 0.3,
         },
     },
 
@@ -158,6 +213,7 @@ def prepare_settings(settings):
             "stage_idxs": (2, 3, 5),
             "channel_multiplier": 1.2,
             "depth_multiplier": 1.4,
+            "drop_rate": 0.3,
         },
     },
 
@@ -173,6 +229,7 @@ def prepare_settings(settings):
             "stage_idxs": (2, 3, 5),
             "channel_multiplier": 1.4,
             "depth_multiplier": 1.8,
+            "drop_rate": 0.4,
         },
     },
 
@@ -188,6 +245,7 @@ def prepare_settings(settings):
             "stage_idxs": (2, 3, 5),
             "channel_multiplier": 1.6,
             "depth_multiplier": 2.2,
+            "drop_rate": 0.4,
         },
     },
 
@@ -203,6 +261,7 @@ def prepare_settings(settings):
             "stage_idxs": (2, 3, 5),
             "channel_multiplier": 1.8,
             "depth_multiplier": 2.6,
+            "drop_rate": 0.5,
         },
     },
 
@@ -218,6 +277,7 @@ def prepare_settings(settings):
             "stage_idxs": (2, 3, 5),
             "channel_multiplier": 2.0,
             "depth_multiplier": 3.1,
+            "drop_rate": 0.5,
         },
     },
 
@@ -232,6 +292,7 @@ def prepare_settings(settings):
             "stage_idxs": (2, 3, 5),
             "channel_multiplier": 2.2,
             "depth_multiplier": 3.6,
+            "drop_rate": 0.5,
         },
     },
 
@@ -245,6 +306,77 @@ def prepare_settings(settings):
             "stage_idxs": (2, 3, 5),
             "channel_multiplier": 4.3,
             "depth_multiplier": 5.3,
+            "drop_rate": 0.5,
+        },
+    },
+
+    "timm-tf_efficientnet_lite0": {
+        "encoder": EfficientNetLiteEncoder,
+        "pretrained_settings": {
+            "imagenet": prepare_settings(default_cfgs["tf_efficientnet_lite0"]),
+        },
+        "params": {
+            "out_channels": (3, 32, 24, 40, 112, 320),
+            "stage_idxs": (2, 3, 5),
+            "channel_multiplier": 1.0,
+            "depth_multiplier": 1.0,
+            "drop_rate": 0.2,
+        },
+    },
+
+    "timm-tf_efficientnet_lite1": {
+        "encoder": EfficientNetLiteEncoder,
+        "pretrained_settings": {
+            "imagenet": prepare_settings(default_cfgs["tf_efficientnet_lite1"]),
+        },
+        "params": {
+            "out_channels": (3, 32, 24, 40, 112, 320),
+            "stage_idxs": (2, 3, 5),
+            "channel_multiplier": 1.0,
+            "depth_multiplier": 1.1,
+            "drop_rate": 0.2,
+        },
+    },
+
+    "timm-tf_efficientnet_lite2": {
+        "encoder": EfficientNetLiteEncoder,
+        "pretrained_settings": {
+            "imagenet": prepare_settings(default_cfgs["tf_efficientnet_lite2"]),
+        },
+        "params": {
+            "out_channels": (3, 32, 24, 48, 120, 352),
+            "stage_idxs": (2, 3, 5),
+            "channel_multiplier": 1.1,
+            "depth_multiplier": 1.2,
+            "drop_rate": 0.3,
+        },
+    },
+
+    "timm-tf_efficientnet_lite3": {
+        "encoder": EfficientNetLiteEncoder,
+        "pretrained_settings": {
+            "imagenet": prepare_settings(default_cfgs["tf_efficientnet_lite3"]),
+        },
+        "params": {
+            "out_channels": (3, 32, 32, 48, 136, 384),
+            "stage_idxs": (2, 3, 5),
+            "channel_multiplier": 1.2,
+            "depth_multiplier": 1.4,
+            "drop_rate": 0.3,
+        },
+    },
+
+    "timm-tf_efficientnet_lite4": {
+        "encoder": EfficientNetLiteEncoder,
+        "pretrained_settings": {
+            "imagenet": prepare_settings(default_cfgs["tf_efficientnet_lite4"]),
+        },
+        "params": {
+            "out_channels": (3, 32, 32, 56, 160, 448),
+            "stage_idxs": (2, 3, 5),
+            "channel_multiplier": 1.4,
+            "depth_multiplier": 1.8,
+            "drop_rate": 0.4,
         },
     },
 }