Why is this asymmetry in the SwinUNETR? #5108

hxhxhx88 · 2022-09-08T18:03:26Z

hxhxhx88
Sep 8, 2022

The forward method of SwinUNETR net looks like this:

def forward(self, x_in):
        hidden_states_out = self.swinViT(x_in, self.normalize)
        enc0 = self.encoder1(x_in)
        enc1 = self.encoder2(hidden_states_out[0])
        enc2 = self.encoder3(hidden_states_out[1])
        enc3 = self.encoder4(hidden_states_out[2])
        dec4 = self.encoder10(hidden_states_out[4])
        dec3 = self.decoder5(dec4, hidden_states_out[3])
        dec2 = self.decoder4(dec3, enc3)
        dec1 = self.decoder3(dec2, enc2)
        dec0 = self.decoder2(dec1, enc1)
        out = self.decoder1(dec0, enc0)
        logits = self.out(out)
        return logits

May I ask why hidden_states_out[3] does not go through an encoder?

Answered by tangy5

Sep 8, 2022

Hi @hxhxhx88 , thanks for the question.
The 3D segmentation network design would like to follow 4 times downsample-upsample architecture like most segmentation networks did (such as UNet). 4 times downsample with encoder won't cause the model to be over-complicated or higher number of parameters compared to 5 encoders-decoders are designed). But there are 5 hidden output feature from the designed SwinUNETR, hidden_states_out[0][1][2][3][4], finally we decide to start from output feature of hidden_states_out[0] as the first encoded feature for Encoder2, followed by [1][2], [3] is skipped and [4] as the bottomneck feature for encoder10. The stage 4 is going to encoder10 which is the bottlen…

View full answer

tangy5 · 2022-09-08T20:02:50Z

tangy5
Sep 8, 2022
Collaborator

Hi @hxhxhx88 , thanks for the question.
The 3D segmentation network design would like to follow 4 times downsample-upsample architecture like most segmentation networks did (such as UNet). 4 times downsample with encoder won't cause the model to be over-complicated or higher number of parameters compared to 5 encoders-decoders are designed). But there are 5 hidden output feature from the designed SwinUNETR, hidden_states_out[0][1][2][3][4], finally we decide to start from output feature of hidden_states_out[0] as the first encoded feature for Encoder2, followed by [1][2], [3] is skipped and [4] as the bottomneck feature for encoder10. The stage 4 is going to encoder10 which is the bottleneck, and stage 3 is going directly to decoder.
thank you.

1 reply

hxhxhx88 Sep 9, 2022
Author

@tangy5 thanks for your reply! Good to know that it is a design decision:)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why is this asymmetry in the SwinUNETR? #5108

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why is this asymmetry in the SwinUNETR? #5108

Uh oh!

hxhxhx88 Sep 8, 2022

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

tangy5 Sep 8, 2022 Collaborator

Uh oh!

hxhxhx88 Sep 9, 2022 Author

hxhxhx88
Sep 8, 2022

Replies: 1 comment 1 reply

tangy5
Sep 8, 2022
Collaborator

hxhxhx88 Sep 9, 2022
Author