Skip to content

⚡️ Speed up function shear by 10%#26

Open
codeflash-ai[bot] wants to merge 2 commits intomainfrom
codeflash/optimize-shear-m8oa12js
Open

⚡️ Speed up function shear by 10%#26
codeflash-ai[bot] wants to merge 2 commits intomainfrom
codeflash/optimize-shear-m8oa12js

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai bot commented Mar 25, 2025

📄 10% (0.10x) speedup for shear in kornia/geometry/transform/affwarp.py

⏱️ Runtime : 17.2 milliseconds 15.6 milliseconds (best of 33 runs)

📝 Explanation and details

Optimizations Made.

  1. Direct Tensor Manipulations: Avoided unnecessary tensor chunk operations in _compute_shear_matrix by directly accessing and updating elements.

  2. Dimension Methods: Replaced ndimension() with dim() for checking tensor dimensions, which is consistent with PyTorch updates and potentially more performant.

  3. Batch Size Handling: Optimized batch size handling in the affine function by retrieving tensor dimensions only once and minimizing redundant calculations.

These optimizations improve computational efficiency by minimizing unnecessary operations and restructuring logic to make better use of PyTorch operations internally. The overall logic and results of the operations remain unchanged, ensuring that the function outputs are consistent with previous versions.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 44 Passed
🌀 Generated Regression Tests 26 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests Details
- geometry/transform/test_affine.py
🌀 Generated Regression Tests Details
from __future__ import annotations

from typing import Optional

# imports
import pytest  # used for our unit tests
import torch
import torch.nn.functional as F
from kornia.core import Tensor, eye, tensor, zeros
from kornia.geometry.conversions import (convert_affinematrix_to_homography,
                                         normalize_homography)
from kornia.geometry.transform.affwarp import shear
from kornia.geometry.transform.imgwarp import warp_affine
from kornia.utils import eye_like
from kornia.utils.helpers import _torch_inverse_cast
from kornia.utils.misc import eye_like
from torch import Tensor

# function to test
# LICENSE HEADER MANAGED BY add-license-header
#
# Copyright 2018 Kornia Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#



__all__ = [
    "Affine",
    "Rescale",
    "Resize",
    "Rotate",
    "Scale",
    "Shear",
    "Translate",
    "affine",
    "affine3d",
    "rescale",
    "resize",
    "resize_to_be_divisible",
    "rotate",
    "rotate3d",
    "scale",
    "shear",
    "translate",
]
from kornia.geometry.transform.affwarp import shear

# LICENSE HEADER MANAGED BY add-license-header
#
# Copyright 2018 Kornia Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#



def eye_like(n: int, input: Tensor, shared_memory: bool = False) -> Tensor:
    r"""Return a 2-D tensor with ones on the diagonal and zeros elsewhere with the same batch size as the input.

    Args:
        n: the number of rows :math:`(N)`.
        input: image tensor that will determine the batch size of the output matrix.
          The expected shape is :math:`(B, *)`.
        shared_memory: when set, all samples in the batch will share the same memory.

    Returns:
       The identity matrix with the same batch size as the input :math:`(B, N, N)`.

    Notes:
        When the dimension to expand is of size 1, using torch.expand(...) yields the same tensor as torch.repeat(...)
        without using extra memory. Thus, when the tensor obtained by this method will be later assigned -
        use this method with shared_memory=False, otherwise, prefer using it with shared_memory=True.

    """
    if n <= 0:
        raise AssertionError(type(n), n)
    if len(input.shape) < 1:
        raise AssertionError(input.shape)

    identity = eye(n, device=input.device).type(input.dtype)

    return identity[None].expand(input.shape[0], n, n) if shared_memory else identity[None].repeat(input.shape[0], 1, 1)

# LICENSE HEADER MANAGED BY add-license-header
#
# Copyright 2018 Kornia Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


__all__ = [
    "get_affine_matrix2d",
    "get_affine_matrix3d",
    "get_perspective_transform",
    "get_perspective_transform3d",
    "get_projective_transform",
    "get_rotation_matrix2d",
    "get_shear_matrix2d",
    "get_shear_matrix3d",
    "get_translation_matrix2d",
    "homography_warp",
    "homography_warp3d",
    "invert_affine_transform",
    "projection_from_Rt",
    "remap",
    "warp_affine",
    "warp_affine3d",
    "warp_grid",
    "warp_grid3d",
    "warp_perspective",
    "warp_perspective3d",
]


def warp_affine(
    src: Tensor,
    M: Tensor,
    dsize: tuple[int, int],
    mode: str = "bilinear",
    padding_mode: str = "zeros",
    align_corners: bool = True,
    fill_value: Optional[Tensor] = None,  # needed for jit
) -> Tensor:
    r"""Apply an affine transformation to a tensor.

    .. image:: _static/img/warp_affine.png

    The function warp_affine transforms the source tensor using
    the specified matrix:

    .. math::
        \text{dst}(x, y) = \text{src} \left( M_{11} x + M_{12} y + M_{13} ,
        M_{21} x + M_{22} y + M_{23} \right )

    Args:
        src: input tensor of shape :math:`(B, C, H, W)`.
        M: affine transformation of shape :math:`(B, 2, 3)`.
        dsize: size of the output image (height, width).
        mode: interpolation mode to calculate output values ``'bilinear'`` | ``'nearest'``.
        padding_mode: padding mode for outside grid values ``'zeros'`` | ``'border'`` | ``'reflection'`` | ``'fill'``.
        align_corners : mode for grid_generation.
        fill_value: tensor of shape :math:`(3)` that fills the padding area. Only supported for RGB.

    Returns:
        the warped tensor with shape :math:`(B, C, H, W)`.

    .. note::
        This function is often used in conjunction with :func:`get_rotation_matrix2d`,
        :func:`get_shear_matrix2d`, :func:`get_affine_matrix2d`, :func:`invert_affine_transform`.

    .. note::
       See a working example `here <https://kornia.github.io/tutorials/nbs/rotate_affine.html>`__.

    Example:
       >>> img = torch.rand(1, 4, 5, 6)
       >>> A = torch.eye(2, 3)[None]
       >>> out = warp_affine(img, A, (4, 2), align_corners=True)
       >>> print(out.shape)
       torch.Size([1, 4, 4, 2])

    """
    if not isinstance(src, Tensor):
        raise TypeError(f"Input src type is not a Tensor. Got {type(src)}")

    if not isinstance(M, Tensor):
        raise TypeError(f"Input M type is not a Tensor. Got {type(M)}")

    if not len(src.shape) == 4:
        raise ValueError(f"Input src must be a BxCxHxW tensor. Got {src.shape}")

    if not (len(M.shape) == 3 or M.shape[-2:] == (2, 3)):
        raise ValueError(f"Input M must be a Bx2x3 tensor. Got {M.shape}")

    B, C, H, W = src.size()

    # we generate a 3x3 transformation matrix from 2x3 affine
    M_3x3: Tensor = convert_affinematrix_to_homography(M)
    dst_norm_trans_src_norm: Tensor = normalize_homography(M_3x3, (H, W), dsize)

    # src_norm_trans_dst_norm = torch.inverse(dst_norm_trans_src_norm)
    src_norm_trans_dst_norm = _torch_inverse_cast(dst_norm_trans_src_norm)

    grid = F.affine_grid(src_norm_trans_dst_norm[:, :2, :], [B, C, dsize[0], dsize[1]], align_corners=align_corners)

    if padding_mode == "fill":
        if fill_value is None:
            fill_value = zeros(3)
        return _fill_and_warp(src, grid, align_corners=align_corners, mode=mode, fill_value=fill_value)

    return F.grid_sample(src, grid, align_corners=align_corners, mode=mode, padding_mode=padding_mode)

# unit tests

def test_single_image_shearing():
    # Single image shearing with basic shear values
    tensor = torch.rand(3, 4, 4)  # CxHxW
    shear_factors = torch.tensor([[0.5, 0.0]])  # Batch size 1
    codeflash_output = shear(tensor, shear_factors)

def test_batch_image_shearing():
    # Batch image shearing with uniform shear values
    tensor = torch.rand(2, 3, 4, 4)  # BxCxHxW
    shear_factors = torch.tensor([[0.5, 0.0], [0.5, 0.0]])  # Batch size 2
    codeflash_output = shear(tensor, shear_factors)


def test_extreme_shear_values():
    # Test with extreme shear values
    tensor = torch.rand(1, 3, 4, 4)
    shear_factors = torch.tensor([[10.0, 10.0]])
    codeflash_output = shear(tensor, shear_factors)

def test_invalid_input_type():
    # Test with invalid input type
    with pytest.raises(TypeError):
        shear("not a tensor", torch.tensor([[0.5, 0.0]]))

def test_invalid_tensor_shape():
    # Test with invalid tensor shape
    with pytest.raises(ValueError):
        shear(torch.rand(3, 4), torch.tensor([[0.5, 0.0]]))


def test_interpolation_modes():
    # Test with different interpolation modes
    tensor = torch.rand(1, 3, 4, 4)
    shear_factors = torch.tensor([[0.5, 0.0]])
    codeflash_output = shear(tensor, shear_factors, mode='bilinear')
    codeflash_output = shear(tensor, shear_factors, mode='nearest')

def test_padding_modes():
    # Test with different padding modes
    tensor = torch.rand(1, 3, 4, 4)
    shear_factors = torch.tensor([[0.5, 0.0]])
    codeflash_output = shear(tensor, shear_factors, padding_mode='zeros')
    codeflash_output = shear(tensor, shear_factors, padding_mode='border')
    codeflash_output = shear(tensor, shear_factors, padding_mode='reflection')

def test_large_scale():
    # Test large scale input
    tensor = torch.rand(10, 3, 256, 256)  # 10 images, 3 channels, 256x256
    shear_factors = torch.rand(10, 2)  # Random shear factors for each image
    codeflash_output = shear(tensor, shear_factors)

def test_single_channel_image():
    # Test single channel image
    tensor = torch.rand(1, 1, 4, 4)  # Single channel
    shear_factors = torch.tensor([[0.5, 0.0]])
    codeflash_output = shear(tensor, shear_factors)

def test_non_square_images():
    # Test non-square images
    tensor = torch.rand(1, 3, 4, 8)  # Rectangular image
    shear_factors = torch.tensor([[0.5, 0.0]])
    codeflash_output = shear(tensor, shear_factors)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from typing import Optional

# imports
import pytest
import torch
import torch.nn.functional as F
from kornia.core import eye, tensor, zeros
from kornia.geometry.conversions import (convert_affinematrix_to_homography,
                                         normalize_homography)
from kornia.geometry.transform.affwarp import shear
from kornia.geometry.transform.imgwarp import warp_affine
from kornia.utils.helpers import _torch_inverse_cast
from kornia.utils.misc import eye_like
from torch import Tensor

# unit tests

def test_identity_shear():
    # Test with no shear, should return the original tensor
    img = torch.rand(1, 3, 4, 4)
    shear_factor = torch.tensor([[0.0, 0.0]])
    codeflash_output = shear(img, shear_factor)

def test_simple_shear():
    # Test with a simple shear factor
    img = torch.rand(1, 3, 4, 4)
    shear_factor = torch.tensor([[0.1, 0.2]])
    codeflash_output = shear(img, shear_factor)

def test_negative_shear():
    # Test with negative shear factors
    img = torch.rand(1, 3, 4, 4)
    shear_factor = torch.tensor([[-0.1, -0.2]])
    codeflash_output = shear(img, shear_factor)

def test_invalid_tensor_type():
    # Test with invalid tensor type
    with pytest.raises(TypeError):
        shear([1, 2, 3], torch.tensor([[0.0, 0.0]]))

def test_invalid_shear_type():
    # Test with invalid shear type
    img = torch.rand(1, 3, 4, 4)
    with pytest.raises(TypeError):
        shear(img, [0.0, 0.0])

def test_invalid_tensor_shape():
    # Test with invalid tensor shape
    img = torch.rand(4, 4)
    with pytest.raises(ValueError):
        shear(img, torch.tensor([[0.0, 0.0]]))


def test_large_batch():
    # Test with a large batch size
    img = torch.rand(100, 3, 32, 32)
    shear_factor = torch.rand(100, 2)
    codeflash_output = shear(img, shear_factor)

def test_high_resolution_image():
    # Test with a high-resolution image
    img = torch.rand(1, 3, 512, 512)
    shear_factor = torch.tensor([[0.1, 0.2]])
    codeflash_output = shear(img, shear_factor)

def test_different_interpolation_modes():
    # Test with different interpolation modes
    img = torch.rand(1, 3, 4, 4)
    shear_factor = torch.tensor([[0.1, 0.2]])
    codeflash_output = shear(img, shear_factor, mode='nearest')
    codeflash_output = shear(img, shear_factor, mode='bilinear')

def test_different_padding_modes():
    # Test with different padding modes
    img = torch.rand(1, 3, 4, 4)
    shear_factor = torch.tensor([[0.1, 0.2]])
    codeflash_output = shear(img, shear_factor, padding_mode='zeros')
    codeflash_output = shear(img, shear_factor, padding_mode='border')
    codeflash_output = shear(img, shear_factor, padding_mode='reflection')
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-shear-m8oa12js and push.

Codeflash

Ubuntu and others added 2 commits March 13, 2025 00:39
### Optimizations Made.

1. **Direct Tensor Manipulations**: Avoided unnecessary tensor chunk operations in `_compute_shear_matrix` by directly accessing and updating elements.

2. **Dimension Methods**: Replaced `ndimension()` with `dim()` for checking tensor dimensions, which is consistent with PyTorch updates and potentially more performant.

3. **Batch Size Handling**: Optimized batch size handling in the `affine` function by retrieving tensor dimensions only once and minimizing redundant calculations.

These optimizations improve computational efficiency by minimizing unnecessary operations and restructuring logic to make better use of PyTorch operations internally. The overall logic and results of the operations remain unchanged, ensuring that the function outputs are consistent with previous versions.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Mar 25, 2025
@codeflash-ai codeflash-ai bot requested a review from dasarchan March 25, 2025 09:11
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

@github-actions github-actions bot added the stale label Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants