Dropout recommendation for segmentation using Swin UnetR #5945

jpcenteno80 · 2023-02-06T18:47:59Z

jpcenteno80
Feb 6, 2023

Hi, I was wondering if you had any recommendations on which type of dropout to use with the Swin UnetR architecture for a segmentation task. This particular task has significantly more background pixels than foreground.

I see that dropout can be applied to the attention (attn_drop), another to the output a few lines after the computation of attention (proj_drop), and a final one in the transformer block (drop_path). I notice loss increasing after a couple hundred epochs and would like to experiment some with dropout and this architecture to see if I can reduce that loss a bit further.

Thank you.

Answered by tangy5

Feb 7, 2023

@jpcenteno80 , thanks for the question. My suggestions are: as Drop_rate or dropout rate is to prevent network from overfitting, if your dataset is large, you probably want lower drop_rate, or set to 0. If your dataset is small you could set it higher, but not too high, such as 0.2. attn_drop_rate is to highlight the informative region for improving the recognition power of the model. 0 for the most discriminative region 1 on the contrary. Drop path also named Stochastic Depth which is a technique to “deactivate” some layers during training. If you are using a "small" model, drop path is preferred to 0 or 0.1, larger model can have larger drop path rate, such as 0.2.

View full answer

KumoLiu · 2023-02-07T03:15:47Z

KumoLiu
Feb 7, 2023
Maintainer

Hi @tangy5, could you please help share some comments on this question? Thanks in advance!

0 replies

tangy5 · 2023-02-07T05:08:40Z

tangy5
Feb 7, 2023
Collaborator

@jpcenteno80 , thanks for the question. My suggestions are: as Drop_rate or dropout rate is to prevent network from overfitting, if your dataset is large, you probably want lower drop_rate, or set to 0. If your dataset is small you could set it higher, but not too high, such as 0.2. attn_drop_rate is to highlight the informative region for improving the recognition power of the model. 0 for the most discriminative region 1 on the contrary. Drop path also named Stochastic Depth which is a technique to “deactivate” some layers during training. If you are using a "small" model, drop path is preferred to 0 or 0.1, larger model can have larger drop path rate, such as 0.2.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dropout recommendation for segmentation using Swin UnetR #5945

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Dropout recommendation for segmentation using Swin UnetR #5945

Uh oh!

jpcenteno80 Feb 6, 2023

Replies: 2 comments

Uh oh!

KumoLiu Feb 7, 2023 Maintainer

Uh oh!

tangy5 Feb 7, 2023 Collaborator

jpcenteno80
Feb 6, 2023

KumoLiu
Feb 7, 2023
Maintainer

tangy5
Feb 7, 2023
Collaborator