⚡️ Speed up function `get_new_h_w` by 38% #126

codeflash-ai · 2025-06-01T08:56:43Z

📄 38% (0.38x) speedup for `get_new_h_w` in `src/diffusers/pipelines/kandinsky/pipeline_kandinsky.py`

⏱️ Runtime : 44.4 microseconds → 32.3 microseconds (best of 486 runs)

📝 Explanation and details

Here is an optimized version of the provided program.
Optimization rationale:

The expression scale_factor**2 is computed multiple times. Caching this value in a local variable improves performance.
Conditional increment can be streamlined by using integer division with rounding up, i.e., math.ceil. As math.ceil(x / y) is equivalent to -(-x // y), this avoids an explicit import and is faster.
The core loop is streamlined and memory accesses minimized.

All output and logic remain unchanged.

Notes:

The logic for "rounding up" is preserved and fully equivalent to the original, just more efficient.
No function signature or return values are changed.
The division and multiplication operations are reduced.
This will run faster especially for larger and more frequent calls.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 59 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests Details

import pytest  # used for our unit tests
from src.diffusers.pipelines.kandinsky.pipeline_kandinsky import get_new_h_w

# unit tests

# -------------------------
# Basic Test Cases
# -------------------------

def test_basic_exact_multiple():
    # Both h and w are exact multiples of scale_factor**2 (64)
    codeflash_output = get_new_h_w(128, 256, scale_factor=8)
    codeflash_output = get_new_h_w(64, 64, scale_factor=8)
    codeflash_output = get_new_h_w(256, 512, scale_factor=16)

def test_basic_non_multiple():
    # h and w are not multiples of scale_factor**2 (64)
    # h=130, w=130, scale_factor=8
    # new_h = 130//64 = 2, 130%64=2, so +1 = 3, 3*8=24
    # new_w = 3*8=24
    codeflash_output = get_new_h_w(130, 130, scale_factor=8)
    # h=65, w=127, scale_factor=8
    # 65//64=1, 65%64=1, +1=2, 2*8=16
    # 127//64=1, 127%64=63, +1=2, 2*8=16
    codeflash_output = get_new_h_w(65, 127, scale_factor=8)

def test_basic_mixed_multiple_and_non_multiple():
    # h is multiple, w is not
    codeflash_output = get_new_h_w(128, 130, scale_factor=8)
    # h is not multiple, w is
    codeflash_output = get_new_h_w(130, 128, scale_factor=8)

def test_basic_scale_factor_1():
    # scale_factor=1, should return h and w unchanged
    codeflash_output = get_new_h_w(10, 20, scale_factor=1)
    codeflash_output = get_new_h_w(0, 0, scale_factor=1)
    codeflash_output = get_new_h_w(100, 100, scale_factor=1)

def test_basic_scale_factor_2():
    # scale_factor=2, scale_factor**2=4
    # h=5, w=9
    # 5//4=1, 5%4=1, +1=2, 2*2=4
    # 9//4=2, 9%4=1, +1=3, 3*2=6
    codeflash_output = get_new_h_w(5, 9, scale_factor=2)
    # h=8, w=8, both exact multiples
    codeflash_output = get_new_h_w(8, 8, scale_factor=2)

# -------------------------
# Edge Test Cases
# -------------------------

def test_edge_zero_dimensions():
    # h or w is zero
    codeflash_output = get_new_h_w(0, 0, scale_factor=8)
    codeflash_output = get_new_h_w(0, 10, scale_factor=8)
    codeflash_output = get_new_h_w(10, 0, scale_factor=8)

def test_edge_one_dimension():
    # h or w is one
    # scale_factor=8, scale_factor**2=64
    # 1//64=0, 1%64=1, +1=1, 1*8=8
    codeflash_output = get_new_h_w(1, 1, scale_factor=8)
    codeflash_output = get_new_h_w(64, 1, scale_factor=8)
    codeflash_output = get_new_h_w(1, 64, scale_factor=8)

def test_edge_smallest_nonzero():
    # h and w just below scale_factor**2
    # scale_factor=8, scale_factor**2=64
    codeflash_output = get_new_h_w(63, 63, scale_factor=8)
    # h and w just above scale_factor**2
    codeflash_output = get_new_h_w(65, 65, scale_factor=8)

def test_edge_large_scale_factor():
    # Large scale_factor, small h/w
    codeflash_output = get_new_h_w(10, 10, scale_factor=100)
    # Large scale_factor, h/w exactly scale_factor**2
    codeflash_output = get_new_h_w(10000, 10000, scale_factor=100)
    # Large scale_factor, h/w just above scale_factor**2
    codeflash_output = get_new_h_w(10001, 10001, scale_factor=100)

def test_edge_negative_inputs():
    # Negative h or w should still be handled, but the logic will produce negative or zero outputs
    # -1//64 = -1, -1%64=63, so +1=0, 0*8=0
    codeflash_output = get_new_h_w(-1, -1, scale_factor=8)
    # Negative h, positive w
    codeflash_output = get_new_h_w(-1, 64, scale_factor=8)
    # Positive h, negative w
    codeflash_output = get_new_h_w(64, -1, scale_factor=8)

def test_edge_scale_factor_zero():
    # scale_factor=0 should raise ZeroDivisionError
    with pytest.raises(ZeroDivisionError):
        get_new_h_w(10, 10, scale_factor=0)

def test_edge_scale_factor_negative():
    # Negative scale_factor, test that the function does not crash and returns correct result
    # scale_factor=-2, scale_factor**2=4
    # 5//4=1, 5%4=1, +1=2, 2*-2=-4
    # 9//4=2, 9%4=1, +1=3, 3*-2=-6
    codeflash_output = get_new_h_w(5, 9, scale_factor=-2)


def test_large_basic():
    # Large h and w, exact multiples
    # scale_factor=8, scale_factor**2=64
    # h=64000, w=64000, 64000//64=1000, 1000*8=8000
    codeflash_output = get_new_h_w(64000, 64000, scale_factor=8)

def test_large_non_multiple():
    # Large h and w, not exact multiples
    # h=64001, w=63999
    # 64001//64=1000, 64001%64=1, +1=1001, 1001*8=8008
    # 63999//64=999, 63999%64=63, +1=1000, 1000*8=8000
    codeflash_output = get_new_h_w(64001, 63999, scale_factor=8)

def test_large_scale_factor():
    # Large scale_factor, large h/w
    # scale_factor=32, scale_factor**2=1024
    # h=102400, w=204800
    # 102400//1024=100, 102400%1024=0, so 100*32=3200
    # 204800//1024=200, 204800%1024=0, so 200*32=6400
    codeflash_output = get_new_h_w(102400, 204800, scale_factor=32)

def test_large_mixed():
    # Large h, small w
    codeflash_output = get_new_h_w(999999, 17, scale_factor=8)
    # Small h, large w
    codeflash_output = get_new_h_w(17, 999999, scale_factor=8)

def test_large_varied_scale_factors():
    # Varying scale_factor with large h/w
    for sf in [2, 4, 8, 16]:
        h = 800 * sf * sf + 13
        w = 600 * sf * sf + 7
        # new_h = (h//sf**2 + (1 if h%sf**2!=0 else 0)) * sf
        expected_h = ((h // (sf**2)) + (1 if h % (sf**2) != 0 else 0)) * sf
        expected_w = ((w // (sf**2)) + (1 if w % (sf**2) != 0 else 0)) * sf
        codeflash_output = get_new_h_w(h, w, scale_factor=sf)

# -------------------------
# Mutation-resistance / regression
# -------------------------

def test_mutation_resistance_off_by_one():
    # If the "+1" logic is omitted, the result will be wrong
    # h=65, w=65, scale_factor=8, scale_factor**2=64
    # 65//64=1, 65%64=1, so +1=2, 2*8=16
    # If +1 is omitted, it would be 1*8=8 (wrong)
    codeflash_output = get_new_h_w(65, 65, scale_factor=8)

def test_mutation_resistance_wrong_power():
    # If scale_factor is not squared, result will be wrong
    # h=256, w=256, scale_factor=8, scale_factor**2=64
    # 256//64=4, 256%64=0, 4*8=32
    codeflash_output = get_new_h_w(256, 256, scale_factor=8)
    # If scale_factor is not squared, it would be 256//8=32, 32*8=256 (wrong)

def test_mutation_resistance_wrong_return():
    # If the function returned (new_h, new_w) instead of (new_h*scale_factor, new_w*scale_factor)
    # The results would be off
    # h=130, w=130, scale_factor=8, scale_factor**2=64
    # 130//64=2, 130%64=2, +1=3, 3*8=24
    codeflash_output = get_new_h_w(130, 130, scale_factor=8)
    # If (3,3) is returned, it's wrong
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest  # used for our unit tests
from src.diffusers.pipelines.kandinsky.pipeline_kandinsky import get_new_h_w

# unit tests

# -------------------
# Basic Test Cases
# -------------------

def test_exact_divisible():
    # Both h and w are exactly divisible by scale_factor**2
    h, w, scale = 64, 64, 8
    # scale_factor**2 = 64, so new_h = 1, new_w = 1, output = (8, 8)
    codeflash_output = get_new_h_w(h, w, scale)

def test_not_divisible():
    # h and w not exactly divisible by scale_factor**2
    h, w, scale = 65, 66, 8
    # h//64=1, h%64=1, so new_h=2, output_h=16
    # w//64=1, w%64=2, so new_w=2, output_w=16
    codeflash_output = get_new_h_w(h, w, scale)

def test_one_dimension_divisible():
    # Only h is divisible, w is not
    h, w, scale = 128, 130, 8
    # h//64=2, h%64=0, new_h=2, output_h=16
    # w//64=2, w%64=2, new_w=3, output_w=24
    codeflash_output = get_new_h_w(h, w, scale)

def test_both_zero():
    # Both h and w are zero
    h, w, scale = 0, 0, 8
    # 0//64=0, 0%64=0, so new_h=0, new_w=0, output=(0,0)
    codeflash_output = get_new_h_w(h, w, scale)

def test_scale_factor_default():
    # Use default scale_factor=8
    h, w = 100, 200
    # scale_factor**2=64
    # h//64=1, h%64=36, new_h=2, output_h=16
    # w//64=3, w%64=8, new_w=4, output_w=32
    codeflash_output = get_new_h_w(h, w)

# -------------------
# Edge Test Cases
# -------------------

def test_minimum_positive():
    # Minimum positive h and w, scale_factor=1
    h, w, scale = 1, 1, 1
    # scale_factor**2=1
    # h//1=1, h%1=0, new_h=1, output_h=1
    # w//1=1, w%1=0, new_w=1, output_w=1
    codeflash_output = get_new_h_w(h, w, scale)

def test_zero_one_dimension():
    # h=0, w>0
    h, w, scale = 0, 10, 2
    # scale_factor**2=4
    # h//4=0, h%4=0, new_h=0, output_h=0
    # w//4=2, w%4=2, new_w=3, output_w=6
    codeflash_output = get_new_h_w(h, w, scale)

def test_large_scale_factor():
    # scale_factor much larger than h and w
    h, w, scale = 10, 10, 100
    # scale_factor**2=10000
    # h//10000=0, h%10000=10, new_h=1, output_h=100
    # w//10000=0, w%10000=10, new_w=1, output_w=100
    codeflash_output = get_new_h_w(h, w, scale)

def test_negative_dimensions():
    # Negative h and w should still work with integer division
    h, w, scale = -10, -20, 2
    # scale_factor**2=4
    # h//4=-3, h%4=2, so new_h=-2, output_h=-4
    # w//4=-5, w%4=0, so new_w=-5, output_w=-10
    codeflash_output = get_new_h_w(h, w, scale)

def test_scale_factor_one():
    # scale_factor=1, should return (h, w)
    h, w, scale = 123, 456, 1
    # scale_factor**2=1
    # h//1=123, h%1=0, new_h=123, output_h=123
    # w//1=456, w%1=0, new_w=456, output_w=456
    codeflash_output = get_new_h_w(h, w, scale)

def test_large_prime_numbers():
    # Large prime numbers for h, w, scale_factor=7
    h, w, scale = 101, 103, 7
    # scale_factor**2=49
    # h//49=2, h%49=3, new_h=3, output_h=21
    # w//49=2, w%49=5, new_w=3, output_w=21
    codeflash_output = get_new_h_w(h, w, scale)

def test_scale_factor_zero():
    # scale_factor=0 should raise ZeroDivisionError
    with pytest.raises(ZeroDivisionError):
        get_new_h_w(10, 10, 0)


def test_large_inputs_divisible():
    # Large h and w, both divisible by scale_factor**2
    h, w, scale = 1024, 2048, 16
    # scale_factor**2=256
    # h//256=4, h%256=0, new_h=4, output_h=64
    # w//256=8, w%256=0, new_w=8, output_w=128
    codeflash_output = get_new_h_w(h, w, scale)

def test_large_inputs_not_divisible():
    # Large h and w, not divisible by scale_factor**2
    h, w, scale = 999, 888, 10
    # scale_factor**2=100
    # h//100=9, h%100=99, new_h=10, output_h=100
    # w//100=8, w%100=88, new_w=9, output_w=90
    codeflash_output = get_new_h_w(h, w, scale)

def test_maximum_allowed_size():
    # Test with maximum allowed size under 1000 elements
    h, w, scale = 999, 999, 3
    # scale_factor**2=9
    # h//9=111, h%9=0, new_h=111, output_h=333
    # w//9=111, w%9=0, new_w=111, output_w=333
    codeflash_output = get_new_h_w(h, w, scale)

def test_large_scale_factor_and_large_inputs():
    # Large scale_factor and large h, w
    h, w, scale = 900, 800, 30
    # scale_factor**2=900
    # h//900=1, h%900=0, new_h=1, output_h=30
    # w//900=0, w%900=800, new_w=1, output_w=30
    codeflash_output = get_new_h_w(h, w, scale)

def test_large_inputs_one_dimension():
    # Large h, small w
    h, w, scale = 999, 2, 10
    # scale_factor**2=100
    # h//100=9, h%100=99, new_h=10, output_h=100
    # w//100=0, w%100=2, new_w=1, output_w=10
    codeflash_output = get_new_h_w(h, w, scale)

# -------------------
# Determinism Test
# -------------------

def test_determinism():
    # Call the function multiple times with the same input and check for same output
    h, w, scale = 123, 456, 7
    codeflash_output = get_new_h_w(h, w, scale); result1 = codeflash_output
    codeflash_output = get_new_h_w(h, w, scale); result2 = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-get_new_h_w-mbdffli9 and push.

Here is an optimized version of the provided program. **Optimization rationale:** - The expression `scale_factor**2` is computed multiple times. Caching this value in a local variable improves performance. - Conditional increment can be streamlined by using integer division with rounding up, i.e., `math.ceil`. As `math.ceil(x / y)` is equivalent to `-(-x // y)`, this avoids an explicit import and is faster. - The core loop is streamlined and memory accesses minimized. **All output and logic remain unchanged.** **Notes:** - The logic for "rounding up" is preserved and fully equivalent to the original, just more efficient. - No function signature or return values are changed. - The division and multiplication operations are reduced. - This will run faster especially for larger and more frequent calls.

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 1, 2025

codeflash-ai bot requested a review from aseembits93 June 1, 2025 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `get_new_h_w` by 38% #126

⚡️ Speed up function `get_new_h_w` by 38% #126

Uh oh!

codeflash-ai bot commented Jun 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

⚡️ Speed up function get_new_h_w by 38% #126

Are you sure you want to change the base?

⚡️ Speed up function get_new_h_w by 38% #126

Uh oh!

Conversation

codeflash-ai bot commented Jun 1, 2025

📄 38% (0.38x) speedup for get_new_h_w in src/diffusers/pipelines/kandinsky/pipeline_kandinsky.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

⚡️ Speed up function `get_new_h_w` by 38% #126

⚡️ Speed up function `get_new_h_w` by 38% #126

📄 38% (0.38x) speedup for `get_new_h_w` in `src/diffusers/pipelines/kandinsky/pipeline_kandinsky.py`