TVTensor-KeyPoints with support for keypoint visibility for training Keypoint R-CNN

### 🚀 The feature

@Alexandre-SCHOEPP @NicolasHug @AntoineSimoulin 

Thank you for releasing the tv_tensor keypoints functionality. I use it in my university teaching, where students build a custom keypoint dataloader for training Keypoint R-CNN.

The tv_tensor utilities correctly handle the geometric transformations, but unfortunately they do not yet support the [visibility flag](https://github.com/pytorch/vision/blob/617079d944b0e72632311c30ae2bbdf1168b901e/torchvision/tv_tensors/_keypoints.py#L65), which is required for [training Keypoint R-CNN](https://github.com/pytorch/vision/blob/617079d944b0e72632311c30ae2bbdf1168b901e/torchvision/models/detection/keypoint_rcnn.py#L41). 


Ultimately, this leads to redundant and inefficient code when building custom dataloaders. See the snippet below:
```python
with open(annotation_path) as f:
    data = json.load(f)
shapes = data["shapes"]

keypoints = []
for shape in shapes:
    cx, cy = shape["points"][0]
    keypoints.append([cx, cy])
keypoints = torch.tensor(keypoints, dtype=torch.float32)
keypoints = keypoints.view(-1, 1, 2)

target = {}
target["keypoints"] = tv_tensors.KeyPoints(keypoints, canvas_size=F.get_size(img))

if self.transforms is not None:
    img, target = self.transforms(img, target)

## add visibility flag - redundant code
rows = target['keypoints'].shape[0]
visibility = torch.full(((rows, 1, 1)), 2.0, device=target['keypoints'].device, dtype=target['keypoints'].dtype)
target['keypoints'] = torch.cat([target['keypoints'], visibility], dim=2)
```

### Motivation, pitch

It would be much more efficient if I could include the visibility flag directly in the keypoints list and then wrap it with the tv_tensors.KeyPoints class. However, this is currently not possible due to the [input tensor shape](https://github.com/pytorch/vision/blob/617079d944b0e72632311c30ae2bbdf1168b901e/torchvision/tv_tensors/_keypoints.py#L65).

A useful new feature would be to apply the transformations only to the first two values (x, y) while leaving the third value (visibility) unchanged.

I’m curious to hear your thoughts on this potential feature. Thank you in advance. 

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TVTensor-KeyPoints with support for keypoint visibility for training Keypoint R-CNN #9281

🚀 The feature

Motivation, pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TVTensor-KeyPoints with support for keypoint visibility for training Keypoint R-CNN #9281

Description

🚀 The feature

Motivation, pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions