Skip to content
Zhengyang Feng edited this page Feb 19, 2021 · 20 revisions
  1. Why is our implementation always seem a bit better than other implementations in general?

e.g. DeepLab on VOC, ERFNet on Cityscapes, CULane, etc. For all methods, due to saved memory from mixed precision training, we use relatively large batch size (at least 4 or 8), large learning rate and do not freeze BN parameters. Also checkpointing for the best model is applied for segmentation.

  1. Why is our SCNN implementation better than the original but appear worse on TuSimple?

If you look at the VGG16-SCNN results on CULane, our re-implementation is no-doubt far superior (over 1.5% improv.), mainly because the original SCNN is in old torch7, while we are based on the modern torchvision backbone implementations and ImageNet pre-training. We also did a simple grid search for learning rate on the validation set. However, the original SCNN are often reported to achieve 96.53% on TuSimple in the literature, much higher than ours. That is mainly due to it was a competition entry, there is no way we (only use train set) can compete with that. Note that the original SCNN paper never mentioned a TuSimple experiment without competition tricks can reach that performance.

  1. Why is our ENet implementation better?

Same as 1, we have better implementations. But the improvement should mainly be attributed to us using the same advanced augmentation scheme and learning rate scheduler from ERFNet for ENet.

Clone this wiki locally