Loss Function Design #152

tsuijenk · 2021-12-10T04:16:51Z

tsuijenk
Dec 10, 2021

Hello, I have a question about the design of your loss functions.

DECODE uses a 2-stage network. Both stages use a 2-dim U-Net. Though these two U-Nets takes different inputs and outputs.

1st stage takes a single frame and spits out the feature representation of a single frame.

2nd stage takes three frames and spits out the final predictions.

Both stages involve optimization of the loss function.

Correct me if I am wrong, but this should mean that the loss function should be "sufficiently generalized" such that it can accept a single frame for 1st stage and three frames for 2nd stage, is that right?

To my further understanding, the loss function takes input of a single image. Hence, the "generalization" part where the loss function accepts multiple frames as input would take place elsewhere, perhaps somewhere encapsulated in the 2nd stage code? Please give me some pointers if you know anything!

Thank you.

Answered by ASpeiser

Dec 13, 2021

Hey!
So the first stage is applied to three images, and as you correctly said, produces feature representations for each frame.
These feature presentations, from all three frames, are then concatenated and forwarded to the second U-Net. So the information from each frame passes both networks consecutively.

The loss is calculated using ground truth positions from just a single frame. But the network has access to three frames to make its prediction (each localization made by the network is a function of all three frames). This way the gradients for both U-nets will depend on all three images. I hope that makes sense, please clarify if I misunderstood your question.

View full answer

ASpeiser · 2021-12-13T16:14:12Z

ASpeiser
Dec 13, 2021
Maintainer

Hey!
So the first stage is applied to three images, and as you correctly said, produces feature representations for each frame.
These feature presentations, from all three frames, are then concatenated and forwarded to the second U-Net. So the information from each frame passes both networks consecutively.

The loss is calculated using ground truth positions from just a single frame. But the network has access to three frames to make its prediction (each localization made by the network is a function of all three frames). This way the gradients for both U-nets will depend on all three images. I hope that makes sense, please clarify if I misunderstood your question.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loss Function Design #152

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Loss Function Design #152

Uh oh!

tsuijenk Dec 10, 2021

Replies: 1 comment

Uh oh!

ASpeiser Dec 13, 2021 Maintainer

tsuijenk
Dec 10, 2021

ASpeiser
Dec 13, 2021
Maintainer