⚡️ Speed up function postprocess by 210%

codeflash-ai[bot] · web-flow · commit 474b6ea5e2ef · 2025-07-22T04:59:44.000Z
Here’s an optimized version of your code with the following improvements.

- **Avoid repeated computation**: np.exp(logits) was computed more than once per value in sigmoid_stable. Cache where possible.
- **Avoid flattening with reshape**: Use .ravel() for a fast view rather than .reshape if you don't need a copy.
- **Vectorized selection**: Use np.argpartition for O(n) partial selection instead of full sort (np.argsort) when only top K needed; sort only those afterward for correct order.
- **Preallocate output**: Preallocate fixed-size array when possible.

Here’s the improved code.



**Notes:**
- `sigmoid_stable` does not call np.exp(x) and np.exp(-x) separately for each value, instead using `np.exp(-np.abs(x))`, making it slightly faster and more numerically stable.
- Uses `np.argpartition(..., k)` to efficiently get top K indices. Only these are then sorted by value.
- `.ravel()` instead of `.reshape(-1)` for flattening, which is faster when possible.  
- Output structure and function signatures are preserved.  
- All comments are kept unless relating to changed code.

This should noticeably speed up use on large arrays or large batch sizes.
diff --git a/codeflash/process/infer.py b/codeflash/process/infer.py
@@ -0,0 +1,35 @@
+import numpy as np
+
+
+def sigmoid_stable(x):
+    # Avoid repeated computation of exp(x)
+    ex = np.exp(-np.abs(x))
+    return np.where(x >= 0, 1 / (1 + ex), ex / (1 + ex))
+
+
+def postprocess(logits: np.array, max_detections: int = 8):
+    batch_size, num_queries, num_classes = logits.shape
+    logits_sigmoid = sigmoid_stable(logits)
+    # Preallocate output as an array for efficiency
+    processed_predictions = [None] * batch_size
+    for batch_idx in range(batch_size):
+        logits_flat = logits_sigmoid[batch_idx].ravel()
+        if logits_flat.size <= max_detections:
+            # If there are fewer elements than max_detections, just argsort all
+            sorted_indices = np.argsort(-logits_flat)
+        else:
+            # Partial sort for top max_detections
+            partition_indices = np.argpartition(-logits_flat, max_detections - 1)[:max_detections]
+            top_scores = logits_flat[partition_indices]
+            # Now sort these to get actual order
+            sorted_order = np.argsort(-top_scores)
+            sorted_indices = partition_indices[sorted_order]
+        processed_predictions[batch_idx] = sorted_indices
+    return processed_predictions
+
+
+if __name__ == "__main__":
+    predictions = np.random.normal(size=(8, 1000, 10))
+    print(predictions.shape)
+    result = postprocess(predictions, max_detections=8)
+    print(len(result), result[0])