Skip to content

Commit 13ece2a

Browse files
authored
Merge pull request #162 from CoinCheung/dev
add cocostuff dataset
2 parents afe86f7 + 4179bce commit 13ece2a

20 files changed

+342
-156
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,5 +114,5 @@ tensorrt/build/*
114114
datasets/coco/train.txt
115115
datasets/coco/val.txt
116116
pretrained/*
117-
lib/coco.py
117+
dist_train.sh
118118

README.md

Lines changed: 65 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,25 @@
33
My implementation of [BiSeNetV1](https://arxiv.org/abs/1808.00897) and [BiSeNetV2](https://arxiv.org/abs/1808.00897).
44

55

6-
The mIOU evaluation result of the models trained and evaluated on cityscapes train/val set is:
6+
mIOUs and fps on cityscapes val set:
77
| none | ss | ssc | msf | mscf | fps(fp16/fp32) | link |
88
|------|:--:|:---:|:---:|:----:|:---:|:----:|
9-
| bisenetv1 | 75.55 | 76.90 | 77.40 | 78.91 | 60/19 | [download](https://drive.google.com/file/d/140MBBAt49N1z1wsKueoFA6HB_QuYud8i/view?usp=sharing) |
10-
| bisenetv2 | 74.12 | 74.18 | 75.89 | 75.87 | 50/16 | [download](https://drive.google.com/file/d/1qq38u9JT4pp1ubecGLTCHHtqwntH0FCY/view?usp=sharing) |
9+
| bisenetv1 | 75.10 | 76.90 | 77.22 | 78.73 | 60/19 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v1_city.pth) |
10+
| bisenetv2 | 74.95 | 75.58 | 76.53 | 77.08 | 50/16 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v2_city.pth) |
1111

12-
> Where **ss** means single scale evaluation, **ssc** means single scale crop evaluation, **msf** means multi-scale evaluation with flip augment, and **mscf** means multi-scale crop evaluation with flip evaluation. The eval scales of multi-scales evaluation are `[0.5, 0.75, 1.0, 1.25, 1.5, 1.75]`, and the crop size of crop evaluation is `[1024, 1024]`.
12+
mIOUs on cocostuff val2017 set:
13+
| none | ss | ssc | msf | mscf | link |
14+
|------|:--:|:---:|:---:|:----:|:----:|
15+
| bisenetv1 | 31.89 | 31.62 | 32.81 | 32.72 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v1_coco.pth) |
16+
| bisenetv2 | 30.49 | 30.55 | 31.81 | 31.73 | [download](https://github.com/CoinCheung/BiSeNet/releases/download/0.0.0/model_final_v2_coco.pth) |
17+
18+
> Where **ss** means single scale evaluation, **ssc** means single scale crop evaluation, **msf** means multi-scale evaluation with flip augment, and **mscf** means multi-scale crop evaluation with flip evaluation. The eval scales and crop size of multi-scales evaluation can be found in [configs](./configs/).
1319
1420
> The fps is tested in different way from the paper. For more information, please see [here](./tensorrt).
1521
16-
Note that the model has a big variance, which means that the results of training for many times would vary within a relatively big margin. For example, if you train bisenetv2 for many times, you will observe that the result of **ss** evaluation of bisenetv2 varies between 72.1-74.4.
22+
> For cocostuff dataset: The authors of the paper `bisenetv2` used the "old split" of 9k train set and 1k val set, while I used the "new split" of 118k train set and 5k val set. Thus the above results on cocostuff does not match the paper. The authors of bisenetv1 did not report their results on cocostuff, so here I simply provide a "make it work" result. Following the tradition of object detection, I used "1x"(90k) and "2x"(180k) schedule to train bisenetv1(1x) and bisenetv2(2x) respectively. Maybe you can have a better result by picking up hyper-parameters more carefully.
23+
24+
Note that the model has a big variance, which means that the results of training for many times would vary within a relatively big margin. For example, if you train bisenetv2 for many times, you will observe that the result of **ss** evaluation of bisenetv2 varies between 73.1-75.1.
1725

1826

1927
## platform
@@ -22,8 +30,8 @@ My platform is like this:
2230
* nvidia Tesla T4 gpu, driver 450.51.05
2331
* cuda 10.2
2432
* cudnn 7
25-
* miniconda python 3.6.9
26-
* pytorch 1.6.0
33+
* miniconda python 3.8.8
34+
* pytorch 1.8.1
2735

2836

2937
## get start
@@ -47,7 +55,24 @@ $ unzip leftImg8bit_trainvaltest.zip
4755
$ unzip gtFine_trainvaltest.zip
4856
```
4957

50-
2.custom dataset
58+
2.cocostuff
59+
60+
Download `train2017.zip`, `val2017.zip` and `stuffthingmaps_trainval2017.zip` split from official [website](https://cocodataset.org/#download). Then do as following:
61+
```
62+
$ unzip train2017.zip
63+
$ unzip val2017.zip
64+
$ mv train2017/ /path/to/BiSeNet/datasets/coco/images
65+
$ mv val2017/ /path/to/BiSeNet/datasets/coco/images
66+
67+
$ unzip stuffthingmaps_trainval2017.zip
68+
$ mv train2017/ /path/to/BiSeNet/datasets/coco/labels
69+
$ mv val2017/ /path/to/BiSeNet/datasets/coco/labels
70+
71+
$ cd /path/to/BiSeNet
72+
$ python tools/gen_coco_annos.py
73+
```
74+
75+
3.custom dataset
5176

5277
If you want to train on your own dataset, you should generate annotation files first with the format like this:
5378
```
@@ -56,30 +81,46 @@ frankfurt_000001_079206_leftImg8bit.png,frankfurt_000001_079206_gtFine_labelIds.
5681
...
5782
```
5883
Each line is a pair of training sample and ground truth image path, which are separated by a single comma `,`.
59-
Then you need to change the field of `im_root` and `train/val_im_anns` in the configuration files.
84+
Then you need to change the field of `im_root` and `train/val_im_anns` in the configuration files. If you found what shows in `cityscapes_cv2.py` is not clear, you can also see `coco.py`.
6085

61-
## train
62-
In order to train the model, you can run command like this:
63-
```
64-
$ export CUDA_VISIBLE_DEVICES=0,1
6586

66-
# if you want to train with apex
67-
$ python -m torch.distributed.launch --nproc_per_node=2 tools/train.py --config configs/bisenetv2_city.py # or bisenetv1
68-
69-
# if you want to train with pytorch fp16 feature from torch 1.6
70-
$ python -m torch.distributed.launch --nproc_per_node=2 tools/train_amp.py --config configs/bisenetv2_city.py # or bisenetv1
87+
## train
88+
I used the following command to train the models:
89+
```bash
90+
# bisenetv1 cityscapes
91+
export CUDA_VISIBLE_DEVICES=0,1
92+
cfg_file=configs/bisenetv1_city.py
93+
NGPUS=2
94+
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file
95+
96+
# bisenetv2 cityscapes
97+
export CUDA_VISIBLE_DEVICES=0,1
98+
cfg_file=configs/bisenetv2_city.py
99+
NGPUS=2
100+
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file
101+
102+
# bisenetv1 cocostuff
103+
export CUDA_VISIBLE_DEVICES=0,1,2,3
104+
cfg_file=configs/bisenetv1_coco.py
105+
NGPUS=4
106+
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file
107+
108+
# bisenetv2 cocostuff
109+
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
110+
cfg_file=configs/bisenetv2_coco.py
111+
NGPUS=8
112+
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file
71113
```
72114

73-
Note that though `bisenetv2` has fewer flops, it requires much more training iterations. The the training time of `bisenetv1` is shorter.
115+
Note:
116+
1. though `bisenetv2` has fewer flops, it requires much more training iterations. The the training time of `bisenetv1` is shorter.
117+
2. I used overall batch size of 16 to train all models. Since cocostuff has 171 categories, it requires more memory to train models on it. I split the 16 images into more gpus than 2, as I do with cityscapes.
74118

75119

76120
## finetune from trained model
77-
You can also load the trained model weights and finetune from it:
121+
You can also load the trained model weights and finetune from it, like this:
78122
```
79123
$ export CUDA_VISIBLE_DEVICES=0,1
80-
$ python -m torch.distributed.launch --nproc_per_node=2 tools/train.py --finetune-from ./res/model_final.pth --config ./configs/bisenetv2_city.py # or bisenetv1
81-
82-
# same with pytorch fp16 feature
83124
$ python -m torch.distributed.launch --nproc_per_node=2 tools/train_amp.py --finetune-from ./res/model_final.pth --config ./configs/bisenetv2_city.py # or bisenetv1
84125
```
85126

@@ -94,6 +135,6 @@ $ python tools/evaluate.py --config configs/bisenetv1_city.py --weight-path /pat
94135
You can go to [tensorrt](./tensorrt) For details.
95136

96137

97-
### Be aware that this is the refactored version of the original codebase. You can go to the `old` directory for original implementation.
138+
### Be aware that this is the refactored version of the original codebase. You can go to the `old` directory for original implementation if you need, though I believe you will not need it.
98139

99140

configs/bisenetv1_city.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11

22
cfg = dict(
33
model_type='bisenetv1',
4+
n_cats=19,
45
num_aux_heads=2,
56
lr_start=1e-2,
67
weight_decay=5e-4,
@@ -12,6 +13,8 @@
1213
val_im_anns='./datasets/cityscapes/val.txt',
1314
scales=[0.75, 2.],
1415
cropsize=[1024, 1024],
16+
eval_crop=[1024, 1024],
17+
eval_scales=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
1518
ims_per_gpu=8,
1619
eval_ims_per_gpu=2,
1720
use_fp16=True,

configs/bisenetv1_coco.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
2+
cfg = dict(
3+
model_type='bisenetv1',
4+
n_cats=171,
5+
num_aux_heads=2,
6+
lr_start=1e-2,
7+
weight_decay=1e-4,
8+
warmup_iters=1000,
9+
max_iter=90000,
10+
dataset='CocoStuff',
11+
im_root='./datasets/coco',
12+
train_im_anns='./datasets/coco/train.txt',
13+
val_im_anns='./datasets/coco/val.txt',
14+
scales=[0.5, 2.],
15+
cropsize=[512, 512],
16+
eval_crop=[512, 512],
17+
eval_scales=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
18+
ims_per_gpu=4,
19+
eval_ims_per_gpu=1,
20+
use_fp16=True,
21+
use_sync_bn=True,
22+
respth='./res',
23+
)

configs/bisenetv2_city.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,20 @@
22
## bisenetv2
33
cfg = dict(
44
model_type='bisenetv2',
5+
n_cats=19,
56
num_aux_heads=4,
6-
lr_start = 5e-3,
7+
lr_start=5e-3,
78
weight_decay=5e-4,
8-
warmup_iters = 1000,
9-
max_iter = 150000,
9+
warmup_iters=1000,
10+
max_iter=150000,
1011
dataset='CityScapes',
1112
im_root='./datasets/cityscapes',
1213
train_im_anns='./datasets/cityscapes/train.txt',
1314
val_im_anns='./datasets/cityscapes/val.txt',
1415
scales=[0.25, 2.],
1516
cropsize=[512, 1024],
17+
eval_crop=[1024, 1024],
18+
eval_scales=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
1619
ims_per_gpu=8,
1720
eval_ims_per_gpu=2,
1821
use_fp16=True,

configs/bisenetv2_coco.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
2+
## bisenetv2
3+
cfg = dict(
4+
model_type='bisenetv2',
5+
n_cats=171,
6+
num_aux_heads=4,
7+
lr_start=5e-3,
8+
weight_decay=1e-4,
9+
warmup_iters=1000,
10+
max_iter=180000,
11+
dataset='CocoStuff',
12+
im_root='./datasets/coco',
13+
train_im_anns='./datasets/coco/train.txt',
14+
val_im_anns='./datasets/coco/val.txt',
15+
scales=[0.75, 2.],
16+
cropsize=[640, 640],
17+
eval_crop=[640, 640],
18+
eval_scales=[0.5, 0.75, 1, 1.25, 1.5, 1.75],
19+
ims_per_gpu=2,
20+
eval_ims_per_gpu=1,
21+
use_fp16=True,
22+
use_sync_bn=True,
23+
respth='./res',
24+
)

dist_train.sh

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,23 @@
11

2-
export CUDA_VISIBLE_DEVICES=2,3
3-
PORT=52330
2+
export CUDA_VISIBLE_DEVICES=0,1
3+
PORT=52332
44
NGPUS=2
5-
cfg_file=configs/bisenetv1_city.py
5+
# cfg_file=configs/bisenetv1_city.py
6+
# cfg_file=configs/bisenetv1_coco.py
7+
# cfg_file=configs/bisenetv2_city.py
8+
cfg_file=configs/bisenetv2_coco.py
9+
10+
# python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file --port $PORT
11+
python -m torch.distributed.launch --nproc_per_node=2 tools/train_amp.py --finetune-from ./res/modelzoo/model_final_v2_city.pth --config ./configs/bisenetv2_city.py # or bisenetv1
12+
13+
## train, use run
14+
# python -m torch.distributed.run --nnode=1 --rdzv_backend=c10d --rdzv_id=001 --rdzv_endpoint=127.0.0.1:$PORT --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file --port $PORT
15+
16+
617

7-
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file --port $PORT
818

919
# python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train.py --config $cfg_file --port $PORT
20+
21+
# python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/evaluate.py --config $cfg_file --port $PORT --weight-path res/modelzoo/model_final_v2_coco.pth
22+
23+

lib/base_dataset.py

Lines changed: 5 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111
import cv2
1212
import numpy as np
1313

14-
import lib.transform_cv2 as T
1514
from lib.sampler import RepeatedDistSampler
1615

1716

@@ -40,7 +39,7 @@ def __init__(self, dataroot, annpath, trans_func=None, mode='train'):
4039

4140
def __getitem__(self, idx):
4241
impth, lbpth = self.img_paths[idx], self.lb_paths[idx]
43-
img, label = cv2.imread(impth)[:, :, ::-1], cv2.imread(lbpth, 0)
42+
img, label = self.get_image(impth, lbpth)
4443
if not self.lb_map is None:
4544
label = self.lb_map[label]
4645
im_lb = dict(im=img, lb=label)
@@ -50,35 +49,14 @@ def __getitem__(self, idx):
5049
img, label = im_lb['im'], im_lb['lb']
5150
return img.detach(), label.unsqueeze(0).detach()
5251

52+
def get_image(self, impth, lbpth):
53+
img, label = cv2.imread(impth)[:, :, ::-1], cv2.imread(lbpth, 0)
54+
return img, label
55+
5356
def __len__(self):
5457
return self.len
5558

5659

57-
class TransformationTrain(object):
58-
59-
def __init__(self, scales, cropsize):
60-
self.trans_func = T.Compose([
61-
T.RandomResizedCrop(scales, cropsize),
62-
T.RandomHorizontalFlip(),
63-
T.ColorJitter(
64-
brightness=0.4,
65-
contrast=0.4,
66-
saturation=0.4
67-
),
68-
])
69-
70-
def __call__(self, im_lb):
71-
im_lb = self.trans_func(im_lb)
72-
return im_lb
73-
74-
75-
class TransformationVal(object):
76-
77-
def __call__(self, im_lb):
78-
im, lb = im_lb['im'], im_lb['lb']
79-
return dict(im=im, lb=lb)
80-
81-
8260
if __name__ == "__main__":
8361
from tqdm import tqdm
8462
from torch.utils.data import DataLoader

lib/coco.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
#!/usr/bin/python
2+
# -*- encoding: utf-8 -*-
3+
4+
import os
5+
import os.path as osp
6+
import json
7+
8+
import torch
9+
from torch.utils.data import Dataset, DataLoader
10+
import torch.distributed as dist
11+
import cv2
12+
import numpy as np
13+
14+
import lib.transform_cv2 as T
15+
from lib.sampler import RepeatedDistSampler
16+
from lib.base_dataset import BaseDataset
17+
18+
'''
19+
91(thing) + 91(stuff) = 182 classes, label proportions are:
20+
[0.0901445377, 0.00157896236, 0.00611962763, 0.00494526505, 0.00335260064, 0.00765355955, 0.00772972804, 0.00631509744,
21+
0.00270457286, 0.000697793344, 0.00114085574, 0.0, 0.00114084131, 0.000705729068, 0.00359758029, 0.00162208938, 0.00598373796,
22+
0.00440213609, 0.00362085441, 0.00193052224, 0.00271001196, 0.00492864603, 0.00186985393, 0.00332902228, 0.00334420294, 0.0,
23+
0.000922751106, 0.00298028204, 0.0, 0.0, 0.0010437561, 0.000285608411, 0.00318569535, 0.000314216755, 0.000313060076, 0.000364755975,
24+
0.000135920434, 0.000678980469, 0.000145436185, 0.000187677684, 0.000640885889, 0.00121345742, 0.000586313048, 0.00160106929, 0.0,
25+
0.000887093272, 0.00252332669, 0.000283407598, 0.000423017189, 0.000247005886, 0.00607086751, 0.002264644, 0.00108296684, 0.00299262899,
26+
0.0013542901, 0.0018255991, 0.000719220519, 0.00127748254, 0.00743539745, 0.0018222117, 0.00368625641, 0.00644224839, 0.00576837542,
27+
0.00234158491, 0.0102560197, 0.0, 0.0310601945, 0.0, 0.0, 0.00321417022, 0.0, 0.00343909654, 0.00366968441, 0.000223077284,
28+
0.000549851977, 0.00142833996, 0.000976368198, 0.000932849475, 0.00367802183, 6.33631941e-05, 0.00179415878, 0.00384408865, 0.0,
29+
0.00178728429, 0.00131955324, 0.00172710316, 0.000355333114, 0.00323052075, 3.45024606e-05, 0.000159319051, 0.0, 0.00233498927,
30+
0.00115535012, 0.00216354199, 0.00122636929, 0.0297802789, 0.00599919161, 0.00792527951, 0.00446247753, 0.00229155615,
31+
0.00481623284, 0.00928416394, 0.000292110971, 0.00100709844, 0.0036950065, 0.0238653594, 0.00318962423, 0.000957967243, 0.00491549702,
32+
0.00305316147, 0.0142686986, 0.00667806178, 0.00940045853, 0.000994700392, 0.00697502858, 0.00163056828, 0.00655119369, 0.00599044442,
33+
0.00200317424, 0.00546109479, 0.00496814246, 0.00128356119, 0.00893122042, 0.0423373213, 0.00275267517, 0.00730936505, 0.00231434982,
34+
0.00435102045, 0.00276966794, 0.00141028174, 0.000251683147, 0.00878006131, 0.00357672108, 0.000183633027, 0.00514584856,
35+
0.000848967739, 0.000662099529, 0.00186883821, 0.00417270686, 0.0224302911, 0.000551947753, 0.00799009014, 0.00379765772,
36+
0.00226731642, 0.0181341982, 0.000835227067, 0.00287355753, 0.00546769461, 0.0242787139, 0.00318951861, 0.00147349686,
37+
0.00167046288, 0.000520877717, 0.0101631583, 0.0234788756, 0.00283978366, 0.0624405778, 0.00258472693, 0.0204314774, 0.000550128266,
38+
0.00112924659, 0.001457768, 0.00190406757, 0.00173232644, 0.0116980759, 0.000850599027, 0.00565381261, 0.000787379463, 0.0577763754,
39+
0.00214883711, 0.00553984356, 0.0443605019, 0.0218570174, 0.0027310644, 0.00225446528, 0.00903008323, 0.00644298871, 0.00442167269,
40+
0.000129279566, 0.00176047379, 0.0101637834, 0.00255549522]
41+
42+
11 thing classes has no annos, proportions are 0:
43+
[11, 25, 28, 29, 44, 65, 67, 68, 70, 82, 90]
44+
'''
45+
46+
47+
48+
class CocoStuff(BaseDataset):
49+
50+
def __init__(self, dataroot, annpath, trans_func=None, mode='train'):
51+
super(CocoStuff, self).__init__(dataroot, annpath, trans_func, mode)
52+
self.n_cats = 171 # 91 stuff, 91 thing, 11 of thing have no annos
53+
self.lb_ignore = 255
54+
55+
## label mapping, remove non-existing labels
56+
missing = [11, 25, 28, 29, 44, 65, 67, 68, 70, 82, 90]
57+
remain = [ind for ind in range(182) if not ind in missing]
58+
self.lb_map = np.arange(256)
59+
for ind in remain:
60+
self.lb_map[ind] = remain.index(ind)
61+
62+
self.to_tensor = T.ToTensor(
63+
mean=(0.46962251, 0.4464104, 0.40718787), # coco, rgb
64+
std=(0.27469736, 0.27012361, 0.28515933),
65+
)
66+
67+

0 commit comments

Comments
 (0)