- Check training with EMA
- Check out other loss functions for empty blocks again
- Implement domain adaptation / semi-supervised learning
- Implement pseudo-labeler that uses confidence mask from first channel (=fg/bg pred) for all channels
(this is just a mental note for me)