⚡️ Speed up function outputs_to_objects by 88%
#440
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 88% (0.88x) speedup for
outputs_to_objectsinunstructured_inference/models/tables.py⏱️ Runtime :
19.7 milliseconds→10.5 milliseconds(best of31runs)📝 Explanation and details
The optimized code achieves an 87% speedup through several key optimizations:
1. Eliminated redundant list conversions and element-wise operations
list(m.indices.detach().cpu().numpy())[0]creates an intermediate listm.indices.detach().cpu().numpy()[0][elem.tolist() for elem in rescale_bboxes(...)]calls.tolist()on each bbox individually.tolist()call after all tensor operations:rescaled.tolist()2. Vectorized padding adjustment
[float(elem) - shift_size for elem in bbox]in Python looprescaled = rescaled - padbefore conversion to list3. Reduced function call overhead
objects.append()performs attribute lookup on each iterationappend = objects.appendcaches the method reference, eliminating repeated lookups4. GPU tensor optimization
device=out_bbox.deviceparameter totorch.tensor()creation to avoid potential device transfer overheadTest case performance patterns:
The optimization is particularly effective for table detection models that typically process many bounding boxes simultaneously.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
models/test_tables.py::test_padded_results_has_right_dimensions🌀 Generated Regression Tests and Runtime
⏪ Replay Tests and Runtime
test_pytest_test_unstructured_inference__replay_test_0.py::test_unstructured_inference_models_tables_outputs_to_objectsTo edit these changes
git checkout codeflash/optimize-outputs_to_objects-metbo2xpand push.