CoinCheung
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 1 deletion b/‎.gitignore‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 65 additions & 24 deletions b/‎README.md‎
Lines changed: 65 additions & 24 deletions
diff --git a/‎configs/bisenetv1_city.py‎
Lines changed: 3 additions & 0 deletions b/‎configs/bisenetv1_city.py‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎configs/bisenetv1_coco.py‎
Lines changed: 23 additions & 0 deletions b/‎configs/bisenetv1_coco.py‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎configs/bisenetv2_city.py‎
Lines changed: 6 additions & 3 deletions b/‎configs/bisenetv2_city.py‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎configs/bisenetv2_coco.py‎
Lines changed: 24 additions & 0 deletions b/‎configs/bisenetv2_coco.py‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎dist_train.sh‎
Lines changed: 18 additions & 4 deletions b/‎dist_train.sh‎
Lines changed: 18 additions & 4 deletions
diff --git a/‎lib/base_dataset.py‎
Lines changed: 5 additions & 27 deletions b/‎lib/base_dataset.py‎
Lines changed: 5 additions & 27 deletions
diff --git a/‎lib/coco.py‎
Lines changed: 67 additions & 0 deletions b/‎lib/coco.py‎
Lines changed: 67 additions & 0 deletions
@@ -114,5 +114,5 @@ tensorrt/build/*
 datasets/coco/train.txt
 datasets/coco/val.txt
 pretrained/*
-lib/coco.py
+dist_train.sh
 
@@ -3,17 +3,25 @@
 My implementation of [BiSeNetV1](https://arxiv.org/abs/1808.00897) and [BiSeNetV2](https://arxiv.org/abs/1808.00897).
 
 
-The mIOU evaluation result of the models trained and evaluated on cityscapes train/val set is:
+mIOUs and fps on cityscapes val set:
 | none | ss | ssc | msf | mscf | fps(fp16/fp32) | link |
 |------|:--:|:---:|:---:|:----:|:---:|:----:|
-| bisenetv1 | 75.55 | 76.90 | 77.40 | 78.91 | 60/19 | [download](https://drive.google.com/file/d/140MBBAt49N1z1wsKueoFA6HB_QuYud8i/view?usp=sharing) |
-| bisenetv2 | 74.12 | 74.18 | 75.89 | 75.87 | 50/16 | [download](https://drive.google.com/file/d/1qq38u9JT4pp1ubecGLTCHHtqwntH0FCY/view?usp=sharing) |
+| bisenetv1 | 75.10 | 76.90 | 77.22 | 78.73 | 60/19 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v1_city.pth) |
+| bisenetv2 | 74.95 | 75.58 | 76.53 | 77.08 | 50/16 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v2_city.pth) |
 
-> Where **ss** means single scale evaluation, **ssc** means single scale crop evaluation, **msf** means multi-scale evaluation with flip augment, and **mscf** means multi-scale crop evaluation with flip evaluation. The eval scales of multi-scales evaluation are `[0.5, 0.75, 1.0, 1.25, 1.5, 1.75]`, and the crop size of crop evaluation is `[1024, 1024]`.
+mIOUs on cocostuff val2017 set:
+| none | ss | ssc | msf | mscf | link |
+|------|:--:|:---:|:---:|:----:|:----:|
+| bisenetv1 | 31.89 | 31.62 | 32.81 | 32.72 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v1_coco.pth) |
+| bisenetv2 | 30.49 | 30.55 | 31.81 | 31.73 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v2_coco.pth) |
+
+> Where **ss** means single scale evaluation, **ssc** means single scale crop evaluation, **msf** means multi-scale evaluation with flip augment, and **mscf** means multi-scale crop evaluation with flip evaluation. The eval scales and crop size of multi-scales evaluation can be found in [configs](./configs/).
 
 > The fps is tested in different way from the paper. For more information, please see [here](./tensorrt).
 
-Note that the model has a big variance, which means that the results of training for many times would vary within a relatively big margin. For example, if you train bisenetv2 for many times, you will observe that the result of **ss** evaluation of bisenetv2 varies between 72.1-74.4. 
+> For cocostuff dataset: The authors of the paper `bisenetv2` used the "old split" of 9k train set and 1k val set, while I used the "new split" of 118k train set and 5k val set. Thus the above results on cocostuff does not match the paper. The authors of bisenetv1 did not report their results on cocostuff, so here I simply provide a "make it work" result. Following the tradition of object detection, I used "1x"(90k) and "2x"(180k) schedule to train bisenetv1(1x) and bisenetv2(2x) respectively. Maybe you can have a better result by picking up hyper-parameters more carefully.
+
+Note that the model has a big variance, which means that the results of training for many times would vary within a relatively big margin. For example, if you train bisenetv2 for many times, you will observe that the result of **ss** evaluation of bisenetv2 varies between 73.1-75.1. 
 
 
 ## platform
@@ -22,8 +30,8 @@ My platform is like this:
 * nvidia Tesla T4 gpu, driver 450.51.05
 * cuda 10.2
 * cudnn 7
-* miniconda python 3.6.9
-* pytorch 1.6.0
+* miniconda python 3.8.8
+* pytorch 1.8.1
 
 
 ## get start
@@ -47,7 +55,24 @@ $ unzip leftImg8bit_trainvaltest.zip
 $ unzip gtFine_trainvaltest.zip
 ```
 
-2.custom dataset  
+2.cocostuff   
+
+Download `train2017.zip`, `val2017.zip` and `stuffthingmaps_trainval2017.zip` split from official [website](https://cocodataset.org/#download). Then do as following:
+```
+$ unzip train2017.zip
+$ unzip val2017.zip
+$ mv train2017/ /path/to/BiSeNet/datasets/coco/images
+$ mv val2017/ /path/to/BiSeNet/datasets/coco/images
+
+$ unzip stuffthingmaps_trainval2017.zip
+$ mv train2017/ /path/to/BiSeNet/datasets/coco/labels
+$ mv val2017/ /path/to/BiSeNet/datasets/coco/labels
+
+$ cd /path/to/BiSeNet
+$ python tools/gen_coco_annos.py
+```
+
+3.custom dataset  
 
 If you want to train on your own dataset, you should generate annotation files first with the format like this: 
 ```
@@ -56,30 +81,46 @@ frankfurt_000001_079206_leftImg8bit.png,frankfurt_000001_079206_gtFine_labelIds.
 ...
 ```
 Each line is a pair of training sample and ground truth image path, which are separated by a single comma `,`.   
-Then you need to change the field of `im_root` and `train/val_im_anns` in the configuration files.
+Then you need to change the field of `im_root` and `train/val_im_anns` in the configuration files. If you found what shows in `cityscapes_cv2.py` is not clear, you can also see `coco.py`.
 
-## train
-In order to train the model, you can run command like this: 
-```
-$ export CUDA_VISIBLE_DEVICES=0,1
 
-# if you want to train with apex
-$ python -m torch.distributed.launch --nproc_per_node=2 tools/train.py --config configs/bisenetv2_city.py # or bisenetv1
-
-# if you want to train with pytorch fp16 feature from torch 1.6
-$ python -m torch.distributed.launch --nproc_per_node=2 tools/train_amp.py --config configs/bisenetv2_city.py # or bisenetv1
+## train
+I used the following command to train the models:
+```bash
+# bisenetv1 cityscapes
+export CUDA_VISIBLE_DEVICES=0,1
+cfg_file=configs/bisenetv1_city.py
+NGPUS=2
+python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file 
+
+# bisenetv2 cityscapes
+export CUDA_VISIBLE_DEVICES=0,1
+cfg_file=configs/bisenetv2_city.py
+NGPUS=2
+python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file 
+
+# bisenetv1 cocostuff
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+cfg_file=configs/bisenetv1_coco.py
+NGPUS=4
+python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file 
+
+# bisenetv2 cocostuff
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+cfg_file=configs/bisenetv2_coco.py
+NGPUS=8
+python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file 
 ```
 
-Note that though `bisenetv2` has fewer flops, it requires much more training iterations. The the training time of `bisenetv1` is shorter.
+Note:  
+1. though `bisenetv2` has fewer flops, it requires much more training iterations. The the training time of `bisenetv1` is shorter.
+2. I used overall batch size of 16 to train all models. Since cocostuff has 171 categories, it requires more memory to train models on it. I split the 16 images into more gpus than 2, as I do with cityscapes.
 
 
 ## finetune from trained model
-You can also load the trained model weights and finetune from it:
+You can also load the trained model weights and finetune from it, like this:
 ```
 $ export CUDA_VISIBLE_DEVICES=0,1
-$ python -m torch.distributed.launch --nproc_per_node=2 tools/train.py --finetune-from ./res/model_final.pth --config ./configs/bisenetv2_city.py # or bisenetv1
-
-# same with pytorch fp16 feature
 $ python -m torch.distributed.launch --nproc_per_node=2 tools/train_amp.py --finetune-from ./res/model_final.pth --config ./configs/bisenetv2_city.py # or bisenetv1
 ```
 
@@ -94,6 +135,6 @@ $ python tools/evaluate.py --config configs/bisenetv1_city.py --weight-path /pat
 You can go to [tensorrt](./tensorrt) For details.
 
 
-### Be aware that this is the refactored version of the original codebase. You can go to the `old` directory for original implementation.
+### Be aware that this is the refactored version of the original codebase. You can go to the `old` directory for original implementation if you need, though I believe you will not need it.
 
 
@@ -1,6 +1,7 @@
 
 cfg = dict(
     model_type='bisenetv1',
+    n_cats=19,
     num_aux_heads=2,
     lr_start=1e-2,
     weight_decay=5e-4,
@@ -12,6 +13,8 @@
     val_im_anns='./datasets/cityscapes/val.txt',
     scales=[0.75, 2.],
     cropsize=[1024, 1024],
+    eval_crop=[1024, 1024],
+    eval_scales=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
     ims_per_gpu=8,
     eval_ims_per_gpu=2,
     use_fp16=True,
 
@@ -0,0 +1,23 @@
+
+cfg = dict(
+    model_type='bisenetv1',
+    n_cats=171,
+    num_aux_heads=2,
+    lr_start=1e-2,
+    weight_decay=1e-4,
+    warmup_iters=1000,
+    max_iter=90000,
+    dataset='CocoStuff',
+    im_root='./datasets/coco',
+    train_im_anns='./datasets/coco/train.txt',
+    val_im_anns='./datasets/coco/val.txt',
+    scales=[0.5, 2.],
+    cropsize=[512, 512],
+    eval_crop=[512, 512],
+    eval_scales=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
+    ims_per_gpu=4,
+    eval_ims_per_gpu=1,
+    use_fp16=True,
+    use_sync_bn=True,
+    respth='./res',
+)
@@ -2,17 +2,20 @@
 ## bisenetv2
 cfg = dict(
     model_type='bisenetv2',
+    n_cats=19,
     num_aux_heads=4,
-    lr_start = 5e-3,
+    lr_start=5e-3,
     weight_decay=5e-4,
-    warmup_iters = 1000,
-    max_iter = 150000,
+    warmup_iters=1000,
+    max_iter=150000,
     dataset='CityScapes',
     im_root='./datasets/cityscapes',
     train_im_anns='./datasets/cityscapes/train.txt',
     val_im_anns='./datasets/cityscapes/val.txt',
     scales=[0.25, 2.],
     cropsize=[512, 1024],
+    eval_crop=[1024, 1024],
+    eval_scales=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
     ims_per_gpu=8,
     eval_ims_per_gpu=2,
     use_fp16=True,
 
@@ -0,0 +1,24 @@
+
+## bisenetv2
+cfg = dict(
+    model_type='bisenetv2',
+    n_cats=171,
+    num_aux_heads=4,
+    lr_start=5e-3,
+    weight_decay=1e-4,
+    warmup_iters=1000,
+    max_iter=180000,
+    dataset='CocoStuff',
+    im_root='./datasets/coco',
+    train_im_anns='./datasets/coco/train.txt',
+    val_im_anns='./datasets/coco/val.txt',
+    scales=[0.75, 2.],
+    cropsize=[640, 640],
+    eval_crop=[640, 640],
+    eval_scales=[0.5, 0.75, 1, 1.25, 1.5, 1.75],
+    ims_per_gpu=2,
+    eval_ims_per_gpu=1,
+    use_fp16=True,
+    use_sync_bn=True,
+    respth='./res',
+)
@@ -1,9 +1,23 @@
 
-export CUDA_VISIBLE_DEVICES=2,3
-PORT=52330
+export CUDA_VISIBLE_DEVICES=0,1
+PORT=52332
 NGPUS=2
-cfg_file=configs/bisenetv1_city.py
+# cfg_file=configs/bisenetv1_city.py
+# cfg_file=configs/bisenetv1_coco.py
+# cfg_file=configs/bisenetv2_city.py
+cfg_file=configs/bisenetv2_coco.py
+
+# python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file --port $PORT
+python -m torch.distributed.launch --nproc_per_node=2 tools/train_amp.py --finetune-from ./res/modelzoo/model_final_v2_city.pth --config ./configs/bisenetv2_city.py # or bisenetv1
+
+## train, use run
+# python -m torch.distributed.run --nnode=1 --rdzv_backend=c10d --rdzv_id=001 --rdzv_endpoint=127.0.0.1:$PORT --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file --port $PORT
+
+
 
-python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file --port $PORT
 
 # python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train.py --config $cfg_file --port $PORT
+
+# python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/evaluate.py --config $cfg_file --port $PORT --weight-path res/modelzoo/model_final_v2_coco.pth
+
+
@@ -11,7 +11,6 @@
 import cv2
 import numpy as np
 
-import lib.transform_cv2 as T
 from lib.sampler import RepeatedDistSampler
 
 
@@ -40,7 +39,7 @@ def __init__(self, dataroot, annpath, trans_func=None, mode='train'):
 
     def __getitem__(self, idx):
         impth, lbpth = self.img_paths[idx], self.lb_paths[idx]
-        img, label = cv2.imread(impth)[:, :, ::-1], cv2.imread(lbpth, 0)
+        img, label = self.get_image(impth, lbpth)
         if not self.lb_map is None:
             label = self.lb_map[label]
         im_lb = dict(im=img, lb=label)
@@ -50,35 +49,14 @@ def __getitem__(self, idx):
         img, label = im_lb['im'], im_lb['lb']
         return img.detach(), label.unsqueeze(0).detach()
 
+    def get_image(self, impth, lbpth):
+        img, label = cv2.imread(impth)[:, :, ::-1], cv2.imread(lbpth, 0)
+        return img, label
+
     def __len__(self):
         return self.len
 
 
-class TransformationTrain(object):
-
-    def __init__(self, scales, cropsize):
-        self.trans_func = T.Compose([
-            T.RandomResizedCrop(scales, cropsize),
-            T.RandomHorizontalFlip(),
-            T.ColorJitter(
-                brightness=0.4,
-                contrast=0.4,
-                saturation=0.4
-            ),
-        ])
-
-    def __call__(self, im_lb):
-        im_lb = self.trans_func(im_lb)
-        return im_lb
-
-
-class TransformationVal(object):
-
-    def __call__(self, im_lb):
-        im, lb = im_lb['im'], im_lb['lb']
-        return dict(im=im, lb=lb)
-
-
 if __name__ == "__main__":
     from tqdm import tqdm
     from torch.utils.data import DataLoader
 
@@ -0,0 +1,67 @@
+#!/usr/bin/python
+# -*- encoding: utf-8 -*-
+
+import os
+import os.path as osp
+import json
+
+import torch
+from torch.utils.data import Dataset, DataLoader
+import torch.distributed as dist
+import cv2
+import numpy as np
+
+import lib.transform_cv2 as T
+from lib.sampler import RepeatedDistSampler
+from lib.base_dataset import BaseDataset
+
+'''
+91(thing) + 91(stuff) = 182 classes, label proportions are:
+    [0.0901445377, 0.00157896236, 0.00611962763, 0.00494526505, 0.00335260064, 0.00765355955, 0.00772972804, 0.00631509744,
+     0.00270457286, 0.000697793344, 0.00114085574, 0.0, 0.00114084131, 0.000705729068, 0.00359758029, 0.00162208938, 0.00598373796,
+     0.00440213609, 0.00362085441, 0.00193052224, 0.00271001196, 0.00492864603, 0.00186985393, 0.00332902228, 0.00334420294, 0.0,
+     0.000922751106, 0.00298028204, 0.0, 0.0, 0.0010437561, 0.000285608411, 0.00318569535, 0.000314216755, 0.000313060076, 0.000364755975,
+     0.000135920434, 0.000678980469, 0.000145436185, 0.000187677684, 0.000640885889, 0.00121345742, 0.000586313048, 0.00160106929, 0.0,
+     0.000887093272, 0.00252332669, 0.000283407598, 0.000423017189, 0.000247005886, 0.00607086751, 0.002264644, 0.00108296684, 0.00299262899,
+     0.0013542901, 0.0018255991, 0.000719220519, 0.00127748254, 0.00743539745, 0.0018222117, 0.00368625641, 0.00644224839, 0.00576837542,
+     0.00234158491, 0.0102560197, 0.0, 0.0310601945, 0.0, 0.0, 0.00321417022, 0.0, 0.00343909654, 0.00366968441, 0.000223077284,
+     0.000549851977, 0.00142833996, 0.000976368198, 0.000932849475, 0.00367802183, 6.33631941e-05, 0.00179415878, 0.00384408865, 0.0,
+     0.00178728429, 0.00131955324, 0.00172710316, 0.000355333114, 0.00323052075, 3.45024606e-05, 0.000159319051, 0.0, 0.00233498927,
+     0.00115535012, 0.00216354199, 0.00122636929, 0.0297802789, 0.00599919161, 0.00792527951, 0.00446247753, 0.00229155615,
+     0.00481623284, 0.00928416394, 0.000292110971, 0.00100709844, 0.0036950065, 0.0238653594, 0.00318962423, 0.000957967243, 0.00491549702,
+     0.00305316147, 0.0142686986, 0.00667806178, 0.00940045853, 0.000994700392, 0.00697502858, 0.00163056828, 0.00655119369, 0.00599044442,
+     0.00200317424, 0.00546109479, 0.00496814246, 0.00128356119, 0.00893122042, 0.0423373213, 0.00275267517, 0.00730936505, 0.00231434982,
+     0.00435102045, 0.00276966794, 0.00141028174, 0.000251683147, 0.00878006131, 0.00357672108, 0.000183633027, 0.00514584856,
+     0.000848967739, 0.000662099529, 0.00186883821, 0.00417270686, 0.0224302911, 0.000551947753, 0.00799009014, 0.00379765772,
+     0.00226731642, 0.0181341982, 0.000835227067, 0.00287355753, 0.00546769461, 0.0242787139, 0.00318951861, 0.00147349686,
+     0.00167046288, 0.000520877717, 0.0101631583, 0.0234788756, 0.00283978366, 0.0624405778, 0.00258472693, 0.0204314774, 0.000550128266,
+     0.00112924659, 0.001457768, 0.00190406757, 0.00173232644, 0.0116980759, 0.000850599027, 0.00565381261, 0.000787379463, 0.0577763754,
+     0.00214883711, 0.00553984356, 0.0443605019, 0.0218570174, 0.0027310644, 0.00225446528, 0.00903008323, 0.00644298871, 0.00442167269,
+     0.000129279566, 0.00176047379, 0.0101637834, 0.00255549522]
+
+11 thing classes has no annos, proportions are 0:
+    [11, 25, 28, 29, 44, 65, 67, 68, 70, 82, 90]
+'''
+
+
+
+class CocoStuff(BaseDataset):
+
+    def __init__(self, dataroot, annpath, trans_func=None, mode='train'):
+        super(CocoStuff, self).__init__(dataroot, annpath, trans_func, mode)
+        self.n_cats = 171 # 91 stuff, 91 thing, 11 of thing have no annos
+        self.lb_ignore = 255
+
+        ## label mapping, remove non-existing labels
+        missing = [11, 25, 28, 29, 44, 65, 67, 68, 70, 82, 90]
+        remain = [ind for ind in range(182) if not ind in missing]
+        self.lb_map = np.arange(256)
+        for ind in remain:
+            self.lb_map[ind] = remain.index(ind)
+
+        self.to_tensor = T.ToTensor(
+            mean=(0.46962251, 0.4464104,  0.40718787), # coco, rgb
+            std=(0.27469736, 0.27012361, 0.28515933),
+        )
+
+