-
Hello, I have a question about the design of your loss functions. DECODE uses a 2-stage network. Both stages use a 2-dim U-Net. Though these two U-Nets takes different inputs and outputs. 1st stage takes a single frame and spits out the feature representation of a single frame. 2nd stage takes three frames and spits out the final predictions. Both stages involve optimization of the loss function. Correct me if I am wrong, but this should mean that the loss function should be "sufficiently generalized" such that it can accept a single frame for 1st stage and three frames for 2nd stage, is that right? To my further understanding, the loss function takes input of a single image. Hence, the "generalization" part where the loss function accepts multiple frames as input would take place elsewhere, perhaps somewhere encapsulated in the 2nd stage code? Please give me some pointers if you know anything! Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey! The loss is calculated using ground truth positions from just a single frame. But the network has access to three frames to make its prediction (each localization made by the network is a function of all three frames). This way the gradients for both U-nets will depend on all three images. I hope that makes sense, please clarify if I misunderstood your question. |
Beta Was this translation helpful? Give feedback.
Hey!
So the first stage is applied to three images, and as you correctly said, produces feature representations for each frame.
These feature presentations, from all three frames, are then concatenated and forwarded to the second U-Net. So the information from each frame passes both networks consecutively.
The loss is calculated using ground truth positions from just a single frame. But the network has access to three frames to make its prediction (each localization made by the network is a function of all three frames). This way the gradients for both U-nets will depend on all three images. I hope that makes sense, please clarify if I misunderstood your question.