Skip to content

Commit 9a8003e

Browse files
AntoineSimoulinscottsNicolasHug
authored
[release/0.23] Documentation cherry picks (#9145)
Co-authored-by: Scott Schneider <[email protected]> Co-authored-by: Nicolas Hug <[email protected]>
1 parent a0883e2 commit 9a8003e

File tree

9 files changed

+272
-43
lines changed

9 files changed

+272
-43
lines changed

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ def __init__(self, src_dir):
8787
"plot_transforms_illustrations.py",
8888
"plot_transforms_e2e.py",
8989
"plot_cutmix_mixup.py",
90+
"plot_rotated_box_transforms.py",
9091
"plot_custom_transforms.py",
9192
"plot_tv_tensors.py",
9293
"plot_custom_tv_tensors.py",

docs/source/transforms.rst

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,20 @@
11
.. _transforms:
22

3-
Transforming and augmenting images
4-
==================================
3+
Transforming images, videos, boxes and more
4+
===========================================
55

66
.. currentmodule:: torchvision.transforms
77

88
Torchvision supports common computer vision transformations in the
9-
``torchvision.transforms`` and ``torchvision.transforms.v2`` modules. Transforms
10-
can be used to transform or augment data for training or inference of different
11-
tasks (image classification, detection, segmentation, video classification).
9+
``torchvision.transforms.v2`` module. Transforms can be used to transform and
10+
augment data, for both training or inference. The following objects are
11+
supported:
12+
13+
- Images as pure tensors, :class:`~torchvision.tv_tensors.Image` or PIL image
14+
- Videos as :class:`~torchvision.tv_tensors.Video`
15+
- Axis-aligned and rotated bounding boxes as :class:`~torchvision.tv_tensors.BoundingBoxes`
16+
- Segmentation and detection masks as :class:`~torchvision.tv_tensors.Mask`
17+
- KeyPoints as :class:`~torchvision.tv_tensors.KeyPoints`.
1218

1319
.. code:: python
1420
@@ -111,9 +117,9 @@ In Torchvision 0.15 (March 2023), we released a new set of transforms available
111117
in the ``torchvision.transforms.v2`` namespace. These transforms have a lot of
112118
advantages compared to the v1 ones (in ``torchvision.transforms``):
113119

114-
- They can transform images **but also** bounding boxes, masks, or videos. This
115-
provides support for tasks beyond image classification: detection, segmentation,
116-
video classification, etc. See
120+
- They can transform images **and also** bounding boxes, masks, videos and
121+
keypoints. This provides support for tasks beyond image classification:
122+
detection, segmentation, video classification, pose estimation, etc. See
117123
:ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py`
118124
and :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`.
119125
- They support more transforms like :class:`~torchvision.transforms.v2.CutMix`

gallery/assets/leaning_tower.jpg

1.25 MB
Loading

gallery/transforms/helpers.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@
22
import torch
33
from torchvision.utils import draw_bounding_boxes, draw_segmentation_masks
44
from torchvision import tv_tensors
5+
from torchvision.transforms import v2
56
from torchvision.transforms.v2 import functional as F
67

78

8-
def plot(imgs, row_title=None, **imshow_kwargs):
9+
def plot(imgs, row_title=None, bbox_width=3, **imshow_kwargs):
910
if not isinstance(imgs[0], list):
1011
# Make a 2d grid even if there's just 1 row
1112
imgs = [imgs]
@@ -24,6 +25,11 @@ def plot(imgs, row_title=None, **imshow_kwargs):
2425
masks = target.get("masks")
2526
elif isinstance(target, tv_tensors.BoundingBoxes):
2627
boxes = target
28+
29+
# Conversion necessary because draw_bounding_boxes() only
30+
# work with this specific format.
31+
if tv_tensors.is_rotated_bounding_format(boxes.format):
32+
boxes = v2.ConvertBoundingBoxFormat("xyxyxyxy")(boxes)
2733
else:
2834
raise ValueError(f"Unexpected target type: {type(target)}")
2935
img = F.to_image(img)
@@ -35,7 +41,7 @@ def plot(imgs, row_title=None, **imshow_kwargs):
3541

3642
img = F.to_dtype(img, torch.uint8, scale=True)
3743
if boxes is not None:
38-
img = draw_bounding_boxes(img, boxes, colors="yellow", width=3)
44+
img = draw_bounding_boxes(img, boxes, colors="yellow", width=bbox_width)
3945
if masks is not None:
4046
img = draw_segmentation_masks(img, masks.to(torch.bool), colors=["green"] * masks.shape[0], alpha=.65)
4147

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
"""
2+
===============================================================
3+
Transforms on Rotated Bounding Boxes
4+
===============================================================
5+
6+
This example illustrates how to define and use rotated bounding boxes.
7+
8+
.. note::
9+
Support for rotated bounding boxes was released in TorchVision 0.23 and is
10+
currently a BETA feature. We don't expect the API to change, but there may
11+
be some rare edge-cases. If you find any issues, please report them on
12+
our bug tracker: https://github.com/pytorch/vision/issues?q=is:open+is:issue
13+
14+
First, a bit of setup code:
15+
"""
16+
17+
# %%
18+
from PIL import Image
19+
from pathlib import Path
20+
import matplotlib.pyplot as plt
21+
22+
23+
import torch
24+
from torchvision.tv_tensors import BoundingBoxes
25+
from torchvision.transforms import v2
26+
from helpers import plot
27+
28+
plt.rcParams["figure.figsize"] = [10, 5]
29+
plt.rcParams["savefig.bbox"] = "tight"
30+
31+
# if you change the seed, make sure that the randomly-applied transforms
32+
# properly show that the image can be both transformed and *not* transformed!
33+
torch.manual_seed(0)
34+
35+
# If you're trying to run that on Colab, you can download the assets and the
36+
# helpers from https://github.com/pytorch/vision/tree/main/gallery/
37+
orig_img = Image.open(Path('../assets') / 'leaning_tower.jpg')
38+
39+
# %%
40+
# Creating a Rotated Bounding Box
41+
# -------------------------------
42+
# Rotated bounding boxes are created by instantiating the
43+
# :class:`~torchvision.tv_tensors.BoundingBoxes` class. It's the ``format``
44+
# parameter of the constructor that determines if a bounding box is rotated or
45+
# not. In this instance, we use the CXCYWHR
46+
# :attr:`~torchvision.tv_tensors.BoundingBoxFormat`. The first two values are
47+
# the X and Y coordinates of the center of the bounding box. The next two
48+
# values are the width and height of the bounding box, and the last value is the
49+
# rotation of the bounding box, in degrees.
50+
51+
52+
orig_box = BoundingBoxes(
53+
[
54+
[860.0, 1100, 570, 1840, -7],
55+
],
56+
format="CXCYWHR",
57+
canvas_size=(orig_img.size[1], orig_img.size[0]),
58+
)
59+
60+
plot([(orig_img, orig_box)], bbox_width=10)
61+
62+
# %%
63+
# Transforms illustrations
64+
# ------------------------
65+
#
66+
# Using :class:`~torchvision.transforms.RandomRotation`:
67+
rotater = v2.RandomRotation(degrees=(0, 180), expand=True)
68+
rotated_imgs = [rotater((orig_img, orig_box)) for _ in range(4)]
69+
plot([(orig_img, orig_box)] + rotated_imgs, bbox_width=10)
70+
71+
# %%
72+
# Using :class:`~torchvision.transforms.Pad`:
73+
padded_imgs_and_boxes = [
74+
v2.Pad(padding=padding)(orig_img, orig_box)
75+
for padding in (30, 50, 100, 200)
76+
]
77+
plot([(orig_img, orig_box)] + padded_imgs_and_boxes, bbox_width=10)
78+
79+
# %%
80+
# Using :class:`~torchvision.transforms.Resize`:
81+
resized_imgs = [
82+
v2.Resize(size=size)(orig_img, orig_box)
83+
for size in (30, 50, 100, orig_img.size)
84+
]
85+
plot([(orig_img, orig_box)] + resized_imgs, bbox_width=5)
86+
87+
# %%
88+
# Note that the bounding box looking bigger in the images with less pixels is
89+
# an artifact, not reality. That is merely the rasterised representation of the
90+
# bounding box's boundaries appearing bigger because we specify a fixed width of
91+
# that rasterized line. When the image is, say, only 30 pixels wide, a
92+
# line that is 3 pixels wide is relatively large.
93+
#
94+
# .. _clamping_mode_tuto:
95+
#
96+
# Clamping Mode, and its effect on transforms
97+
# -------------------------------------------
98+
#
99+
# Some transforms, such as :class:`~torchvision.transforms.CenterCrop`, may
100+
# result in having the transformed bounding box partially outside of the
101+
# transformed (cropped) image. In general, this may happen on most of the
102+
# :ref:`geometric transforms <v2_api_ref>`.
103+
#
104+
# In such cases, the bounding box is clamped to the transformed image size based
105+
# on its ``clamping_mode`` attribute. There are three values for
106+
# ``clamping_mode``, which determines how the box is clamped after a
107+
# transformation:
108+
#
109+
# - ``None``: No clamping is applied, and the bounding box may be partially
110+
# outside of the image.
111+
# - `"hard"`: The box is clamped to the image size, such that all its corners
112+
# are within the image canvas. This potentially results in a loss of
113+
# information, and it can lead to unintuitive resuts. But may be necessary
114+
# for some applications e.g. if the model doesn't support boxes outside of
115+
# their image.
116+
# - `"soft"`: . This is an intermediate mode between ``None`` and "hard": the
117+
# box is clamped, but not as strictly as in "hard" mode. Some box dimensions
118+
# may still be outside of the image. This is the default when constucting
119+
# :class:`~torchvision.tv_tensors.BoundingBoxes`.
120+
#
121+
# .. note::
122+
#
123+
# For axis-aligned bounding boxes, the `"soft"` and `"hard"` modes behave
124+
# the same, as the bounding box is always clamped to the image size.
125+
#
126+
# Let's illustrate the clamping modes with
127+
# :class:`~torchvision.transforms.CenterCrop` transform:
128+
129+
assert orig_box.clamping_mode == "soft"
130+
131+
box_hard_clamping = BoundingBoxes(orig_box, format=orig_box.format, canvas_size=orig_box.canvas_size, clamping_mode="hard")
132+
133+
box_no_clamping = BoundingBoxes(orig_box, format=orig_box.format, canvas_size=orig_box.canvas_size, clamping_mode=None)
134+
135+
crop_sizes = (800, 1200, 2000, orig_img.size)
136+
soft_center_crops_and_boxes = [
137+
v2.CenterCrop(size=size)(orig_img, orig_box)
138+
for size in crop_sizes
139+
]
140+
141+
hard_center_crops_and_boxes = [
142+
v2.CenterCrop(size=size)(orig_img, box_hard_clamping)
143+
for size in crop_sizes
144+
]
145+
146+
no_clamping_center_crops_and_boxes = [
147+
v2.CenterCrop(size=size)(orig_img, box_no_clamping)
148+
for size in crop_sizes
149+
]
150+
151+
plot([[(orig_img, box_hard_clamping)] + hard_center_crops_and_boxes,
152+
[(orig_img, orig_box)] + soft_center_crops_and_boxes,
153+
[(orig_img, box_no_clamping)] + no_clamping_center_crops_and_boxes],
154+
bbox_width=10)
155+
156+
# %%
157+
# The plot above shows the "hard" clamping mode, "soft" and ``None``, in this
158+
# order. While "soft" and ``None`` result in similar plots, they do not lead to
159+
# the exact same clamped boxes. The non-clamped boxes will show dimensions that are further away from the image:
160+
print("boxes with soft clamping:")
161+
print(soft_center_crops_and_boxes)
162+
print()
163+
print("boxes with no clamping:")
164+
print(no_clamping_center_crops_and_boxes)
165+
166+
# %%
167+
#
168+
# Setting the clamping mode
169+
# --------------------------
170+
#
171+
# The ``clamping_mode`` attribute, which determines the clamping strategy that
172+
# is applied to a box, can be set in different ways:
173+
#
174+
# - When constructing the bounding box with its
175+
# :class:`~torchvision.tv_tensors.BoundingBoxes` constructor, as done in the example above.
176+
# - By directly setting the attribute on an existing instance, e.g. ``boxes.clamping_mode = "hard"``.
177+
# - By calling the :class:`~torchvision.transforms.v2.SetClampingMode` transform.
178+
#
179+
# Also, remember that you can always clamp the bounding box manually by
180+
# calling the :meth:`~torchvision.transforms.v2.ClampBoundingBoxes` transform!
181+
# Here's an example illustrating all of these option:
182+
183+
t = v2.Compose([
184+
v2.CenterCrop(size=(800,)), # clamps according to the current clamping_mode
185+
# attribute, in this case set by the constructor
186+
v2.SetClampingMode(None), # sets the clamping_mode attribute for future transforms
187+
v2.Pad(padding=3), # clamps according to the current clamping_mode
188+
# i.e. ``None``
189+
v2.ClampBoundingBoxes(clamping_mode="soft"), # clamps with "soft" mode.
190+
])
191+
192+
out_img, out_box = t(orig_img, orig_box)
193+
plot([(orig_img, orig_box), (out_img, out_box)], bbox_width=10)
194+
195+
# %%

gallery/transforms/plot_transforms_getting_started.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -79,12 +79,13 @@
7979
# very easy: the v2 transforms are fully compatible with the v1 API, so you
8080
# only need to change the import!
8181
#
82-
# Detection, Segmentation, Videos
82+
# Videos, boxes, masks, keypoints
8383
# -------------------------------
8484
#
85-
# The new Torchvision transforms in the ``torchvision.transforms.v2`` namespace
86-
# support tasks beyond image classification: they can also transform bounding
87-
# boxes, segmentation / detection masks, or videos.
85+
# The Torchvision transforms in the ``torchvision.transforms.v2`` namespace
86+
# support tasks beyond image classification: they can also transform rotated or
87+
# axis-aligned bounding boxes, segmentation / detection masks, videos, and
88+
# keypoints.
8889
#
8990
# Let's briefly look at a detection example with bounding boxes.
9091

@@ -129,8 +130,9 @@
129130
# TVTensors are :class:`torch.Tensor` subclasses. The available TVTensors are
130131
# :class:`~torchvision.tv_tensors.Image`,
131132
# :class:`~torchvision.tv_tensors.BoundingBoxes`,
132-
# :class:`~torchvision.tv_tensors.Mask`, and
133-
# :class:`~torchvision.tv_tensors.Video`.
133+
# :class:`~torchvision.tv_tensors.Mask`,
134+
# :class:`~torchvision.tv_tensors.Video`, and
135+
# :class:`~torchvision.tv_tensors.KeyPoints`.
134136
#
135137
# TVTensors look and feel just like regular tensors - they **are** tensors.
136138
# Everything that is supported on a plain :class:`torch.Tensor` like ``.sum()``

torchvision/transforms/v2/_meta.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,10 @@ def transform(self, inpt: tv_tensors.BoundingBoxes, params: dict[str, Any]) -> t
2727
class ClampBoundingBoxes(Transform):
2828
"""Clamp bounding boxes to their corresponding image dimensions.
2929
30-
The clamping is done according to the bounding boxes' ``canvas_size`` meta-data.
31-
3230
Args:
33-
clamping_mode: TODOBB more docs. Default is None which relies on the input box' clamping_mode attribute.
34-
31+
clamping_mode: Default is "auto" which relies on the input box'
32+
``clamping_mode`` attribute. Read more in :ref:`clamping_mode_tuto`
33+
for more details on how to use this transform.
3534
"""
3635

3736
def __init__(self, clamping_mode: Union[CLAMPING_MODE_TYPE, str] = "auto") -> None:
@@ -57,7 +56,15 @@ def transform(self, inpt: tv_tensors.KeyPoints, params: dict[str, Any]) -> tv_te
5756

5857

5958
class SetClampingMode(Transform):
60-
"""TODOBB"""
59+
"""Sets the ``clamping_mode`` attribute of the bounding boxes for future transforms.
60+
61+
62+
63+
Args:
64+
clamping_mode: The clamping mode to set. Possible values are: "soft",
65+
"hard", or ``None``. Read more in :ref:`clamping_mode_tuto` for more
66+
details on how to use this transform.
67+
"""
6168

6269
def __init__(self, clamping_mode: CLAMPING_MODE_TYPE) -> None:
6370
super().__init__()

torchvision/tv_tensors/_bounding_boxes.py

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,17 +16,20 @@ class BoundingBoxFormat(Enum):
1616
1717
Available formats are:
1818
19-
* ``XYXY``
20-
* ``XYWH``
21-
* ``CXCYWH``
22-
* ``XYWHR``: rotated boxes represented via corner, width and height, x1, y1
23-
being top left, w, h being width and height. r is rotation angle in
19+
* ``XYXY``: bounding box represented via corners; x1, y1 being top left;
20+
x2, y2 being bottom right.
21+
* ``XYWH``: bounding box represented via corner, width and height; x1, y1
22+
being top left; w, h being width and height.
23+
* ``CXCYWH``: bounding box represented via centre, width and height; cx,
24+
cy being center of box; w, h being width and height.
25+
* ``XYWHR``: rotated boxes represented via corner, width and height; x1, y1
26+
being top left; w, h being width and height. r is rotation angle in
2427
degrees.
25-
* ``CXCYWHR``: rotated boxes represented via centre, width and height, cx,
26-
cy being center of box, w, h being width and height. r is rotation angle
28+
* ``CXCYWHR``: rotated boxes represented via center, width and height; cx,
29+
cy being center of box; w, h being width and height. r is rotation angle
2730
in degrees.
28-
* ``XYXYXYXY``: rotated boxes represented via corners, x1, y1 being top
29-
left, x2, y2 being top right, x3, y3 being bottom right, x4, y4 being
31+
* ``XYXYXYXY``: rotated boxes represented via corners; x1, y1 being top
32+
left; x2, y2 being top right; x3, y3 being bottom right; x4, y4 being
3033
bottom left.
3134
"""
3235

@@ -56,12 +59,17 @@ def is_rotated_bounding_format(format: BoundingBoxFormat | str) -> bool:
5659
# This should ideally be a Literal, but torchscript fails.
5760
CLAMPING_MODE_TYPE = Optional[str]
5861

59-
# TODOBB All docs. Add any new API to rst files, add tutorial[s].
60-
6162

6263
class BoundingBoxes(TVTensor):
6364
""":class:`torch.Tensor` subclass for bounding boxes with shape ``[N, K]``.
6465
66+
.. note::
67+
Support for rotated bounding boxes was released in TorchVision 0.23 and
68+
is currently a BETA feature. We don't expect the API to change, but
69+
there may be some rare edge-cases. If you find any issues, please report
70+
them on our bug tracker:
71+
https://github.com/pytorch/vision/issues?q=is:open+is:issue
72+
6573
Where ``N`` is the number of bounding boxes
6674
and ``K`` is 4 for unrotated boxes, and 5 or 8 for rotated boxes.
6775
@@ -75,7 +83,8 @@ class BoundingBoxes(TVTensor):
7583
data: Any data that can be turned into a tensor with :func:`torch.as_tensor`.
7684
format (BoundingBoxFormat, str): Format of the bounding box.
7785
canvas_size (two-tuple of ints): Height and width of the corresponding image or video.
78-
clamping_mode: TODOBB
86+
clamping_mode: The clamping mode to use when applying transforms that may result in bounding boxes
87+
partially outside of the image. Possible values are: "soft", "hard", or ``None``. Read more in :ref:`clamping_mode_tuto`.
7988
dtype (torch.dtype, optional): Desired data type of the bounding box. If omitted, will be inferred from
8089
``data``.
8190
device (torch.device, optional): Desired device of the bounding box. If omitted and ``data`` is a

0 commit comments

Comments
 (0)