-
Notifications
You must be signed in to change notification settings - Fork 146
Notes
1. Why is our implementation always seem a bit better than other implementations in general?
e.g. DeepLab on VOC, ERFNet on Cityscapes, CULane, etc. For all methods, due to saved memory from mixed precision training, we use relatively large batch size (at least 4 or 8), large learning rate and do not freeze BN parameters. Also checkpointing for the best model is applied for segmentation.
2. Why is our code seem to run faster?
We mostly employ mixed precision training, which is fast. Also we try optimize GPU-to-CPU conversions and I/O whenever we can so that they do not become a bottleneck, e.g. lane detection testing. And we support arbitrary batch size in test inference.
3. Why is our SCNN implementation better than the original but appear worse on TuSimple?
If you look at the VGG16-SCNN results on CULane, our re-implementation is no-doubt far superior (over 1.5% improv.), mainly because the original SCNN is in old torch7, while we are based on the modern torchvision backbone implementations and ImageNet pre-training. We also did a simple grid search for learning rate on the validation set. However, the original SCNN are often reported to achieve 96.53% on TuSimple in the literature, much higher than ours. That is mainly due to it was a competition entry, there is no way we (only use train set) can compete with that. Note that the original SCNN paper never mentioned a TuSimple experiment without competition tricks can reach that performance.
4. Why our Baselines in lane detection have surprising good performance?
As methods become more advance, they are less dependent on hyper-parameters (such as learning rate or backbone). For instance, our re-implemented SCNN achieves similarly good results across VGG, ResNets and ERFNet. Baselines, on the other hand, is most sensitive to hyper-parameters. Unlike other researchers who tend to not tune hyper-parameters on Baselines (can't really blame them, their resources are focus on their own methods), we conduct the same learning rate grid search on Baselines. As a result, we find that larger learning rates bring much better Baseline performance.
5. ResNet backbone in lane detection
In this field, we find the seminal SCNN paper provides the most precise evaluation of ResNet baselines. But for ResNet backbones, we follow RESA to reduce the channel numbers to 128, so the computation is feasible to add SCNN/RESA. The SCNN paper did not reduce channels and further applied dropout on ResNet-101, while we find the RESA configuration brings similarly good performance at a lower compute budget. We also support the modified light-weight ResNet18 from LSTR, which is called ResNet18s in this repo.