Skip to content

View size not compatible with input tensors - Multiple object detection #15

@annamalai-19

Description

@annamalai-19

I'm trying to detect multiple objects across different multiple frames of a video. I'm passing different object id for different inputs. But i'm facing an issue with Input tensor shape not compatible while doing inference with the directory of images. Particularly the issue occurs when combining results of a single object across the multiple frames.

Issue code

for out_frame_idx, out_obj_ids, out_mask_logits in self.predictor.propagate_in_video(self.inference_state):
            print("Masking the frma number: ", out_frame_idx)
            self.video_segments[out_frame_idx] = {
                out_obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
                for i, out_obj_id in enumerate(out_obj_ids)
            }

Input to model

Currently i want to detect two objects from a video/directory of images. I also tried bbox input instead of points, but it also doesn't seem to work. I added objects in the 1st frame using the add_new_points_or_bbox API from the edgetam model.

  _, out_obj_ids, out_mask_logits = self.predictor.add_new_points_or_box(**predictor_kwargs)

In this 1st frame, the model detects multiple objects. But when using prediction for all the frames, the error is thrown.

Note: This is my input before being converted to numpy arrays. I only pass the inputs after converting to arrays.

object 1 =  {'id': 1, 'points': [(543, 924)], 'labels': [1], 'bbox': []}
object 2 = {'id': 2, 'points': [(540, 732)], 'labels': [1], 'bbox': []}

Error screenshot

Image

Full traceback:

for out_frame_idx, out_obj_ids, out_mask_logits in self.predictor.propagate_in_video(self.inference_state):
File "C:\Users\Admin\anaconda3\envs\cicd\lib\site-packages\torch\utils_contextlib.py", line 36, in generator_context
response = gen.send(None)
File "C:\Users\Admin\Downloads\EdgeTAM\sam2\sam2_video_predictor.py", line 671, in propagate_in_video
self.propagate_in_video_preflight(inference_state)
File "C:\Users\Admin\anaconda3\envs\cicd\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\Users\Admin\Downloads\EdgeTAM\sam2\sam2_video_predictor.py", line 619, in propagate_in_video_preflight
consolidated_out = self._consolidate_temp_output_across_obj(
File "C:\Users\Admin\Downloads\EdgeTAM\sam2\sam2_video_predictor.py", line 543, in _consolidate_temp_output_across_obj
maskmem_features, maskmem_pos_enc = self._run_memory_encoder(
File "C:\Users\Admin\Downloads\EdgeTAM\sam2\sam2_video_predictor.py", line 998, in _run_memory_encoder
File "C:\Users\Admin\Downloads\EdgeTAM\sam2\modeling\sam2_base.py", line 740, in _encode_new_memory
maskmem_features, maskmem_pos_enc[0] = self.spatial_perceiver(
File "C:\Users\Admin\anaconda3\envs\cicd\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\Admin\anaconda3\envs\cicd\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Admin\Downloads\EdgeTAM\sam2\modeling\perceiver.py", line 265, in forward
latents_2d, pos_2d = self.forward_2d(x)
File "C:\Users\Admin\Downloads\EdgeTAM\sam2\modeling\perceiver.py", line 302, in forward_2d
latents_2d = self.latents_2d.unsqueeze(0).expand(B, -1, -1).view(-1, 1, C)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Can someone help me to identify the root cause for this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions