⚡️ Speed up function `postprocess` by 210% #567

codeflash-ai · 2025-07-22T04:59:47Z

📄 210% (2.10x) speedup for `postprocess` in `codeflash/process/infer.py`

⏱️ Runtime : 6.64 milliseconds → 2.14 milliseconds (best of 563 runs)

📝 Explanation and details

Here’s an optimized version of your code with the following improvements.

Avoid repeated computation: np.exp(logits) was computed more than once per value in sigmoid_stable. Cache where possible.
Avoid flattening with reshape: Use .ravel() for a fast view rather than .reshape if you don't need a copy.
Vectorized selection: Use np.argpartition for O(n) partial selection instead of full sort (np.argsort) when only top K needed; sort only those afterward for correct order.
Preallocate output: Preallocate fixed-size array when possible.

Here’s the improved code.

Notes:

sigmoid_stable does not call np.exp(x) and np.exp(-x) separately for each value, instead using np.exp(-np.abs(x)), making it slightly faster and more numerically stable.
Uses np.argpartition(..., k) to efficiently get top K indices. Only these are then sorted by value.
.ravel() instead of .reshape(-1) for flattening, which is faster when possible.
Output structure and function signatures are preserved.
All comments are kept unless relating to changed code.

This should noticeably speed up use on large arrays or large batch sizes.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 36 Passed
⏪ Replay Tests	✅ 1 Passed
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import numpy as np
# imports
import pytest  # used for our unit tests
from codeflash.process.infer import postprocess

# unit tests

# -------------------------------
# 1. Basic Test Cases
# -------------------------------

def test_basic_single_batch_single_query_single_class():
    # Shape: (1, 1, 1)
    logits = np.array([[[0.0]]])
    codeflash_output = postprocess(logits, max_detections=1); result = codeflash_output
    
def test_basic_single_batch_multiple_queries_classes():
    # Shape: (1, 2, 2)
    logits = np.array([[[0.1, 0.5], [0.3, -0.2]]])
    # Sigmoid: [ [0.52498, 0.62245], [0.57444, 0.45017] ]
    # Flattened: [0.52498, 0.62245, 0.57444, 0.45017]
    # Sorted: [1,2,0,3]
    codeflash_output = postprocess(logits, max_detections=2); result = codeflash_output

def test_basic_multiple_batches():
    # Shape: (2, 1, 2)
    logits = np.array([
        [[0.2, 0.8]],
        [[-0.5, 0.0]]
    ])
    codeflash_output = postprocess(logits, max_detections=1); result = codeflash_output

def test_basic_max_detections_greater_than_elements():
    # Shape: (1, 2, 2) = 4 elements
    logits = np.array([[[1, 2], [3, 4]]])
    # Request more than available
    codeflash_output = postprocess(logits, max_detections=10); result = codeflash_output

# -------------------------------
# 2. Edge Test Cases
# -------------------------------

def test_edge_zero_logits():
    # All zeros: sigmoid(0) = 0.5
    logits = np.zeros((1, 3, 2))
    codeflash_output = postprocess(logits, max_detections=3); result = codeflash_output

def test_edge_negative_logits():
    # All negative values
    logits = np.array([[[-1, -2], [-3, -4]]])
    # Sigmoid: [0.2689, 0.1192, 0.0474, 0.01799]
    codeflash_output = postprocess(logits, max_detections=2); result = codeflash_output

def test_edge_large_logits():
    # Test with very large positive and negative values
    logits = np.array([[[1000, -1000], [100, -100]]])
    # Sigmoid(1000) ~ 1, sigmoid(-1000) ~ 0, etc.
    codeflash_output = postprocess(logits, max_detections=2); result = codeflash_output

def test_edge_single_element():
    # Shape: (1,1,1)
    logits = np.array([[[5.0]]])
    codeflash_output = postprocess(logits, max_detections=1); result = codeflash_output

def test_edge_empty_batch():
    # batch_size=0, shape: (0, 2, 2)
    logits = np.empty((0,2,2))
    codeflash_output = postprocess(logits, max_detections=2); result = codeflash_output

def test_edge_zero_queries_or_classes():
    # num_queries=0
    logits = np.empty((1,0,2))
    codeflash_output = postprocess(logits, max_detections=2); result = codeflash_output

    # num_classes=0
    logits = np.empty((1,2,0))
    codeflash_output = postprocess(logits, max_detections=2); result = codeflash_output

def test_edge_max_detections_zero():
    # Should return empty indices for each batch
    logits = np.random.randn(2,3,4)
    codeflash_output = postprocess(logits, max_detections=0); result = codeflash_output

def test_edge_tied_logits():
    # All logits are the same, so all sigmoid values are equal
    logits = np.ones((1,2,2))
    codeflash_output = postprocess(logits, max_detections=3); result = codeflash_output

def test_edge_non_integer_logits():
    # Use float values
    logits = np.array([[[0.1, 0.2], [0.3, 0.4]]])
    codeflash_output = postprocess(logits, max_detections=2); result = codeflash_output

# -------------------------------
# 3. Large Scale Test Cases
# -------------------------------

def test_large_batch():
    # Large batch size, small queries/classes
    logits = np.random.randn(100, 2, 3)  # 100 batches
    codeflash_output = postprocess(logits, max_detections=2); result = codeflash_output
    for batch in result:
        pass

def test_large_queries_classes():
    # Single batch, large queries/classes
    logits = np.random.randn(1, 30, 30)  # 900 elements
    codeflash_output = postprocess(logits, max_detections=10); result = codeflash_output

def test_large_max_detections_equals_elements():
    # max_detections == num_elements
    logits = np.random.randn(1, 20, 20)  # 400 elements
    codeflash_output = postprocess(logits, max_detections=400); result = codeflash_output

def test_large_max_detections_exceeds_elements():
    # max_detections > num_elements
    logits = np.random.randn(1, 10, 10)  # 100 elements
    codeflash_output = postprocess(logits, max_detections=150); result = codeflash_output

def test_large_randomized_content():
    # Stress test with random values and multiple batches
    np.random.seed(42)
    logits = np.random.uniform(-10, 10, size=(10, 10, 10))  # 10 batches, 100 elements each
    codeflash_output = postprocess(logits, max_detections=5); result = codeflash_output
    for batch in result:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import numpy as np
# imports
import pytest  # used for our unit tests
from codeflash.process.infer import postprocess

# unit tests

# 1. Basic Test Cases

def test_postprocess_single_batch_simple():
    # Single batch, 2 queries, 2 classes, simple logits
    logits = np.array([[[0, 1], [2, 3]]])  # shape (1,2,2)
    # sigmoid: [[0.5, ~0.731], [~0.881, ~0.953]]
    # flattened: [0.5, 0.731, 0.881, 0.953]
    # sorted: [0.953, 0.881, 0.731, 0.5] -> indices [3,2,1,0]
    codeflash_output = postprocess(logits, max_detections=2); out = codeflash_output

def test_postprocess_multiple_batches():
    # Two batches, 1 query, 3 classes
    logits = np.array([
        [[0, 1, 2]],
        [[-1, -2, -3]]
    ])
    # batch 0: [0.5, 0.731, 0.881]; batch 1: [0.269, 0.119, 0.047]
    codeflash_output = postprocess(logits, max_detections=1); out = codeflash_output

def test_postprocess_max_detections_greater_than_elements():
    # max_detections > total elements
    logits = np.array([[[1, 2], [3, 4]]])  # shape (1,2,2) -> 4 elements
    codeflash_output = postprocess(logits, max_detections=10); out = codeflash_output

def test_postprocess_max_detections_equals_elements():
    # max_detections == total elements
    logits = np.array([[[1, 2], [3, 4]]])
    codeflash_output = postprocess(logits, max_detections=4); out = codeflash_output

# 2. Edge Test Cases

def test_postprocess_all_zeros():
    # All logits are zero, so all sigmoid are 0.5, ties resolved by np.argsort (lowest index first)
    logits = np.zeros((1, 2, 3))  # shape (1,2,3) -> 6 elements
    codeflash_output = postprocess(logits, max_detections=3); out = codeflash_output

def test_postprocess_negative_logits():
    # All logits negative, sigmoid < 0.5
    logits = np.array([[[-1, -2], [-3, -4]]])
    # sigmoid: [[0.269, 0.119], [0.047, 0.018]]
    # sorted: [0.269, 0.119, 0.047, 0.018] -> indices [0,1,2,3]
    codeflash_output = postprocess(logits, max_detections=2); out = codeflash_output

def test_postprocess_large_positive_logits():
    # Large positive logits, sigmoid ~1
    logits = np.full((1, 2, 2), 1000)
    codeflash_output = postprocess(logits, max_detections=2); out = codeflash_output

def test_postprocess_large_negative_logits():
    # Large negative logits, sigmoid ~0
    logits = np.full((1, 2, 2), -1000)
    codeflash_output = postprocess(logits, max_detections=2); out = codeflash_output

def test_postprocess_empty_batch():
    # Empty batch, should return empty list
    logits = np.zeros((0, 2, 2))
    codeflash_output = postprocess(logits, max_detections=2); out = codeflash_output

def test_postprocess_single_element():
    # Only one element
    logits = np.array([[[42]]])
    codeflash_output = postprocess(logits, max_detections=1); out = codeflash_output

def test_postprocess_zero_max_detections():
    # max_detections = 0, should return empty indices for each batch
    logits = np.ones((2, 2, 2))
    codeflash_output = postprocess(logits, max_detections=0); out = codeflash_output

def test_postprocess_non_integer_logits():
    # Float logits
    logits = np.array([[[0.1, 0.2], [0.3, 0.4]]])
    codeflash_output = postprocess(logits, max_detections=2); out = codeflash_output

def test_postprocess_tied_values():
    # Multiple identical max values
    logits = np.array([[[1, 1], [0, 0]]])
    # sigmoid: [0.731, 0.731, 0.5, 0.5]
    codeflash_output = postprocess(logits, max_detections=3); out = codeflash_output

# 3. Large Scale Test Cases

def test_postprocess_large_batch():
    # Large batch size, moderate queries/classes
    logits = np.random.randn(100, 3, 3)
    codeflash_output = postprocess(logits, max_detections=5); out = codeflash_output
    for indices in out:
        pass

def test_postprocess_large_queries_classes():
    # Single batch, large queries and classes
    logits = np.random.uniform(-5, 5, size=(1, 30, 30))
    codeflash_output = postprocess(logits, max_detections=10); out = codeflash_output

def test_postprocess_max_detections_equals_total_large():
    # max_detections == total elements (edge of allowed size)
    logits = np.random.randn(1, 31, 31)  # 961 elements
    codeflash_output = postprocess(logits, max_detections=961); out = codeflash_output


def test_postprocess_multiple_batches_large():
    # Multiple batches, each with moderate size
    logits = np.random.randn(10, 10, 10)
    codeflash_output = postprocess(logits, max_detections=7); out = codeflash_output
    for indices in out:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from codeflash.process.infer import postprocess

⏪ Replay Tests and Runtime

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup

To edit these changes git checkout codeflash/optimize-postprocess-mde2fc44 and push.

Signed-off-by: Saurabh Misra <[email protected]>

docs add note

Here’s an optimized version of your code with the following improvements. - **Avoid repeated computation**: np.exp(logits) was computed more than once per value in sigmoid_stable. Cache where possible. - **Avoid flattening with reshape**: Use .ravel() for a fast view rather than .reshape if you don't need a copy. - **Vectorized selection**: Use np.argpartition for O(n) partial selection instead of full sort (np.argsort) when only top K needed; sort only those afterward for correct order. - **Preallocate output**: Preallocate fixed-size array when possible. Here’s the improved code. **Notes:** - `sigmoid_stable` does not call np.exp(x) and np.exp(-x) separately for each value, instead using `np.exp(-np.abs(x))`, making it slightly faster and more numerically stable. - Uses `np.argpartition(..., k)` to efficiently get top K indices. Only these are then sorted by value. - `.ravel()` instead of `.reshape(-1)` for flattening, which is faster when possible. - Output structure and function signatures are preserved. - All comments are kept unless relating to changed code. This should noticeably speed up use on large arrays or large batch sizes.

…2fc44

docs/docs/optimizing-with-codeflash/trace-and-optimize.md

misrasaurabh1 and others added 3 commits July 21, 2025 20:50

doc add

0eadf0e

Signed-off-by: Saurabh Misra <[email protected]>

Merge pull request #565 from codeflash-ai/note-about-replay-tests

102b4b6

docs add note

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 22, 2025

codeflash-ai bot requested a review from misrasaurabh1 July 22, 2025 04:59

misrasaurabh1 changed the base branch from main to optimize-infer July 22, 2025 05:31

misrasaurabh1 changed the base branch from optimize-infer to main July 22, 2025 05:53

misrasaurabh1 closed this Jul 24, 2025

codeflash-ai bot deleted the codeflash/optimize-postprocess-mde2fc44 branch July 24, 2025 20:54

misrasaurabh1 restored the codeflash/optimize-postprocess-mde2fc44 branch July 29, 2025 03:05

misrasaurabh1 reopened this Jul 29, 2025

github-actions bot added the Review effort 2/5 label Jul 29, 2025

codeflash-ai deleted a comment from github-actions bot Jul 29, 2025

misrasaurabh1 changed the base branch from main to optimize-infer July 29, 2025 18:37

Merge branch 'optimize-infer' into codeflash/optimize-postprocess-mde…

2bc10e1

…2fc44

misrasaurabh1 reviewed Jul 29, 2025

View reviewed changes

docs/docs/optimizing-with-codeflash/trace-and-optimize.md Outdated Show resolved Hide resolved

Update docs/docs/optimizing-with-codeflash/trace-and-optimize.md

5a42033

misrasaurabh1 reviewed Jul 29, 2025

View reviewed changes

docs/docs/optimizing-with-codeflash/trace-and-optimize.md Outdated Show resolved Hide resolved

Update docs/docs/optimizing-with-codeflash/trace-and-optimize.md

bf078f8

misrasaurabh1 closed this Jul 29, 2025

codeflash-ai bot deleted the codeflash/optimize-postprocess-mde2fc44 branch July 29, 2025 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `postprocess` by 210% #567

⚡️ Speed up function `postprocess` by 210% #567

Uh oh!

codeflash-ai bot commented Jul 22, 2025 •

edited by misrasaurabh1

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function postprocess by 210% #567

⚡️ Speed up function postprocess by 210% #567

Uh oh!

Conversation

codeflash-ai bot commented Jul 22, 2025 • edited by misrasaurabh1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📄 210% (2.10x) speedup for postprocess in codeflash/process/infer.py

📝 Explanation and details

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `postprocess` by 210% #567

⚡️ Speed up function `postprocess` by 210% #567

codeflash-ai bot commented Jul 22, 2025 •

edited by misrasaurabh1

Loading

📄 210% (2.10x) speedup for `postprocess` in `codeflash/process/infer.py`