- Channels: RGB
- Oversampling
- External data: http://v18.proteinatlas.org
- Resize, Rotate, RandomRotate90, HorizontalFlip, RandomBrightnessContrast, Normalize
- Backbone: Resnet50 pretrained on ImageNet
- Head: 2 linear layers with batch normalization and dropout
- Binary Cross Entropy Loss
- 5-fold CV
- Optimizer: Adam
- Different learning rates for different layers
- Head fine-tuning with frozen backbone (
1epoch) - Scheduler: Cyclical Learning Rates
Stage 1:
- Image size:
256 - Batch size:
128 - Epochs:
16
Stage 2:
- Image size:
512 - Batch size:
32 - Epochs:
6
- TTA:
8 - TTA augmentation: Resize, Rotate, RandomRotate90, HorizontalFlip, Normalize
- The mean of the predictions
- Threshold:
0.2
- Training takes
~35hours on Tesla v100 - Public LB:
0.595 - Private LB:
0.523
- Mixed precision works poorly
- External data helps a lot
- BCE Loss with oversampling is much better than Focal Loss
Resnet50outperformsResnet18andResnet34- 5 folds improve score by
0.024 - TTA helps too
First, clone the repository
git clone https://github.com/rebryk/kaggle.git
cd kaggle/human-protein
Second, install requirements
pip install -r requirements.txt
Third, install apex
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext
And last but not least, update config files in the configs folder to match your preferences!
Use scripts/external_data.py and scripts/convert_data.py to download and convert external data.
# Stage 1
cp configs/train256.py config.py
python train.py
# Stage 2
cp configs/train512.py config.py
python train.py
cp configs/test.py config.py
python test.py
Submissions are saved in the submissions folder.