-
Notifications
You must be signed in to change notification settings - Fork 146
Notes
1. Why is our implementation always seem a bit better than other implementations in general?
e.g. DeepLab on VOC, ERFNet on Cityscapes, CULane, etc. For all methods, due to saved memory from mixed precision training, we use relatively large batch size (at least 4 or 8), large learning rate and do not freeze BN parameters. Also checkpointing for the best model is applied for segmentation.
2. Why is our code seem to run faster?
We mostly employ mixed precision training, which is fast. Also we try optimize GPU-to-CPU conversions and I/O whenever we can so that they do not become a bottleneck, e.g. lane detection testing. And we support arbitrary batch size in test inference.
3. Why is our SCNN implementation better than the original but appear worse on TuSimple?
If you look at the VGG16-SCNN results on CULane, our re-implementation is no-doubt far superior (over 1.5% improv.), mainly because the original SCNN is in old torch7, while we are based on the modern torchvision backbone implementations and ImageNet pre-training. We also did a simple grid search for learning rate on the validation set. However, the original SCNN are often reported to achieve 96.53% on TuSimple in the literature, much higher than ours. That is mainly due to it was a competition entry, there is no way we (only use train set) can compete with that. Note that the original SCNN paper never mentioned a TuSimple experiment without competition tricks can reach that performance.
4. Why is our ENet implementation ~4% better?
Same as 1, we have better implementations. But the improvement should mainly be attributed to us using the same advanced augmentation scheme and learning rate scheduler from ERFNet for ENet.