View size not compatible with input tensors - Multiple object detection

I'm trying to detect multiple objects across **different multiple frames** of a video. I'm passing different object id for different inputs. But i'm facing an issue with **Input tensor shape not compatible** while doing inference with the directory of images. Particularly the issue occurs when combining results of a single object across the multiple frames. 

### Issue code
```
for out_frame_idx, out_obj_ids, out_mask_logits in self.predictor.propagate_in_video(self.inference_state):
            print("Masking the frma number: ", out_frame_idx)
            self.video_segments[out_frame_idx] = {
                out_obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
                for i, out_obj_id in enumerate(out_obj_ids)
            }
```

### Input to model
Currently i want to detect two objects from a video/directory of images. I also tried bbox input instead of points, but it also doesn't seem to work. I added objects in the 1st frame using the _add_new_points_or_bbox_ API from the edgetam model.

```
  _, out_obj_ids, out_mask_logits = self.predictor.add_new_points_or_box(**predictor_kwargs)

```
In this 1st frame, the model detects multiple objects. But when using prediction for all the frames, the error is thrown.

**Note:** This is my input before being converted to numpy arrays. I only pass the inputs after converting to arrays.
```
object 1 =  {'id': 1, 'points': [(543, 924)], 'labels': [1], 'bbox': []}
object 2 = {'id': 2, 'points': [(540, 732)], 'labels': [1], 'bbox': []}
```

### Error screenshot
<img width="1611" height="331" alt="Image" src="https://github.com/user-attachments/assets/9f6dc54a-4216-43e9-bd0b-95e6eaf6dc74" />


### Full traceback:

for out_frame_idx, out_obj_ids, out_mask_logits in self.predictor.propagate_in_video(self.inference_state):
  File "C:\Users\Admin\anaconda3\envs\cicd\lib\site-packages\torch\utils\_contextlib.py", line 36, in generator_context
    response = gen.send(None)
  File "C:\Users\Admin\Downloads\EdgeTAM\sam2\sam2_video_predictor.py", line 671, in propagate_in_video
    self.propagate_in_video_preflight(inference_state)
  File "C:\Users\Admin\anaconda3\envs\cicd\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Admin\Downloads\EdgeTAM\sam2\sam2_video_predictor.py", line 619, in propagate_in_video_preflight
    consolidated_out = self._consolidate_temp_output_across_obj(
  File "C:\Users\Admin\Downloads\EdgeTAM\sam2\sam2_video_predictor.py", line 543, in _consolidate_temp_output_across_obj
    maskmem_features, maskmem_pos_enc = self._run_memory_encoder(
  File "C:\Users\Admin\Downloads\EdgeTAM\sam2\sam2_video_predictor.py", line 998, in _run_memory_encoder
  File "C:\Users\Admin\Downloads\EdgeTAM\sam2\modeling\sam2_base.py", line 740, in _encode_new_memory
    maskmem_features, maskmem_pos_enc[0] = self.spatial_perceiver(
  File "C:\Users\Admin\anaconda3\envs\cicd\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\envs\cicd\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Admin\Downloads\EdgeTAM\sam2\modeling\perceiver.py", line 265, in forward
    latents_2d, pos_2d = self.forward_2d(x)
  File "C:\Users\Admin\Downloads\EdgeTAM\sam2\modeling\perceiver.py", line 302, in forward_2d
    latents_2d = self.latents_2d.unsqueeze(0).expand(B, -1, -1).view(-1, 1, C)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.        


Can someone help me to identify the root cause for this issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

View size not compatible with input tensors - Multiple object detection #15

Issue code

Input to model

Error screenshot

Full traceback:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

View size not compatible with input tensors - Multiple object detection #15

Description

Issue code

Input to model

Error screenshot

Full traceback:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions