How to do sliding window for detection? #7117
Unanswered
jhairgallardo
asked this question in
Q&A
Replies: 1 comment 1 reply
-
Hi @jhairgallardo, thanks for your interest here.
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello! I am new using Monai. I managed to follow the retinanet tutorial example and created my own detection code based on DETR. For a input image, my network output consist of a dictionary that contains class probabilities and bounding boxes.
For example, for batch size 1 (single input), my input shape is (1,1,96,96,96) , which means (batch, channel, w,h,d). The output of my network would be (assuming it outputs 2 bounding boxes and there are only 2 classes):
{ "pred_logits": this would be a tensor with shape (1,2,2), "boxes": this would be a tensor with shape (1,2,6) }
Where pre_logits is (batch, num_boxes, num_classes) and boxes is (batch, num_boxes, box_tensor_dimension).
I want to get predictions for a larger input image doing sliding window approach. Let's say for a (1,1,300,300,100). I tried using the sliding window method (SlidingWindowInferer) done in the detection tutorial for retinanet, but I get an error regarding the output dimensional size of my network not matching the input one. (input is 5 dimensional, values in the output dictionary are 3 dimensional). In the function definition (https://docs.monai.io/en/stable/inferers.html#sliding-window-inference-function) it says the following:
"e.g., the input patch spatial size is [128,128,128], the output (a tuple of two patches) patch sizes could be ([128,64,256], [64,32,128])."
Is there a way to get around this? I only need the sliding window to give me the outputs for each patch and the locations of each patch so I can modify the bounding boxes accordingly. Not sure if there is another function in Monai that does this.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions