-
The def forward(self, x_in):
hidden_states_out = self.swinViT(x_in, self.normalize)
enc0 = self.encoder1(x_in)
enc1 = self.encoder2(hidden_states_out[0])
enc2 = self.encoder3(hidden_states_out[1])
enc3 = self.encoder4(hidden_states_out[2])
dec4 = self.encoder10(hidden_states_out[4])
dec3 = self.decoder5(dec4, hidden_states_out[3])
dec2 = self.decoder4(dec3, enc3)
dec1 = self.decoder3(dec2, enc2)
dec0 = self.decoder2(dec1, enc1)
out = self.decoder1(dec0, enc0)
logits = self.out(out)
return logits May I ask why |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @hxhxhx88 , thanks for the question. |
Beta Was this translation helpful? Give feedback.
Hi @hxhxhx88 , thanks for the question.
The 3D segmentation network design would like to follow 4 times downsample-upsample architecture like most segmentation networks did (such as UNet). 4 times downsample with encoder won't cause the model to be over-complicated or higher number of parameters compared to 5 encoders-decoders are designed). But there are 5 hidden output feature from the designed SwinUNETR, hidden_states_out[0][1][2][3][4], finally we decide to start from output feature of hidden_states_out[0] as the first encoded feature for Encoder2, followed by [1][2], [3] is skipped and [4] as the bottomneck feature for encoder10. The stage 4 is going to encoder10 which is the bottlen…