enhancement: optimize outputs to objects metbo2xp #443

qued · 2025-10-07T22:50:51Z

From codeflash: Optimizes table model pipeline by keeping the data as a tensor for longer, enabling vector operations and reducing conversions.

The optimized code achieves an 87% speedup through several key optimizations: **1. Eliminated redundant list conversions and element-wise operations** - **Original**: `list(m.indices.detach().cpu().numpy())[0]` creates an intermediate list - **Optimized**: Direct numpy array access `m.indices.detach().cpu().numpy()[0]` - **Original**: List comprehension `[elem.tolist() for elem in rescale_bboxes(...)]` calls `.tolist()` on each bbox individually - **Optimized**: Single `.tolist()` call after all tensor operations: `rescaled.tolist()` **2. Vectorized padding adjustment** - **Original**: Per-element subtraction `[float(elem) - shift_size for elem in bbox]` in Python loop - **Optimized**: Tensor-wide subtraction `rescaled = rescaled - pad` before conversion to list - This leverages PyTorch's optimized C++ backend instead of Python loops **3. Reduced function call overhead** - **Original**: `objects.append()` performs attribute lookup on each iteration - **Optimized**: `append = objects.append` caches the method reference, eliminating repeated lookups **4. GPU tensor optimization** - Added `device=out_bbox.device` parameter to `torch.tensor()` creation to avoid potential device transfer overhead **Test case performance patterns:** - **Small cases (single objects)**: 5-7% improvement from reduced overhead - **Large cases (500-1000 objects)**: 160-200% improvement due to vectorized operations scaling much better than element-wise Python loops - **Mixed workloads**: Consistent improvements across all scenarios, with larger gains when more objects need processing The optimization is particularly effective for table detection models that typically process many bounding boxes simultaneously.

qued · 2025-10-07T22:52:12Z

@claude critique this PR please.

claude · 2025-10-08T14:01:21Z

Claude finished @qued's task —— View job

PR Review Progress ✅

Reviewing the optimization changes in this PR:

Read repository guidelines (CLAUDE.md)
Analyze the core optimization in tables.py
Review version bump and changelog updates
Check for potential issues or improvements
Provide comprehensive review feedback

Review Feedback

This PR contains a solid optimization to the outputs_to_objects function in unstructured_inference/models/tables.py. The changes focus on keeping tensor operations on GPU/device longer before converting to CPU, which should reduce unnecessary data transfers and conversions.

✅ Positive Changes

1. Reduced unnecessary conversions (lines 216-217)

# Before: Converting to list early
pred_labels = list(m.indices.detach().cpu().numpy())[0]
pred_scores = list(m.values.detach().cpu().numpy())[0]

# After: Keep as numpy arrays
pred_labels = m.indices.detach().cpu().numpy()[0]
pred_scores = m.values.detach().cpu().numpy()[0]

This eliminates redundant list conversions and keeps data in more efficient numpy format.

2. Tensor vectorization for bounding box processing (lines 222-228)

# Before: List comprehension with individual tensor operations
pred_bboxes = [elem.tolist() for elem in rescale_bboxes(pred_bboxes, scale_size)]

# After: Vectorized tensor operations
rescaled = rescale_bboxes(pred_bboxes, scale_size)
if pad != 0:
    rescaled = rescaled - pad
pred_bboxes = rescaled.tolist()

This replaces a Python loop with vectorized tensor operations, which is more efficient.

3. Device-aware tensor creation (line 284)

# Before: Creates tensor on default device
b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32)

# After: Creates tensor on same device as input
b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32, device=out_bbox.device)

This avoids potential device mismatches and cross-device operations.

4. Simplified bbox adjustment logic (line 237)
The padding adjustment is now handled in the vectorized rescaling step rather than during individual bbox processing.

🔍 Minor Observations

1. Removed micro-optimization
The removal of append = objects.append (line 230) is fine - this micro-optimization provides negligible benefit in modern Python and makes code less readable.

2. Improved readability
The padding logic is now more explicit and easier to follow with the conditional check if pad != 0:.

✅ Quality Assurance

Correctness: All mathematical operations remain equivalent
Performance: Should see meaningful speedup, especially with GPU tensors
Memory: Reduced intermediate allocations and copies
Compatibility: Changes are internal implementation details that don't affect API
Error handling: Maintains existing error handling patterns

📋 Version Management

The changelog and version updates are appropriately minimal:

Version bumped to 1.0.8-dev0
Changelog entry correctly attributes the optimization

✅ Overall Assessment

This is a well-executed performance optimization that follows PyTorch best practices. The changes are focused, maintain correctness, and should provide measurable performance improvements, especially when processing on GPU. The optimization strategy of keeping data in tensor format longer is sound and aligns with the stated goal of "enabling vector operations and reducing conversions."

Recommendation: ✅ Approve

codeflash-ai bot and others added 2 commits August 27, 2025 01:54

ready to review

cb8c966

qued requested a review from ajjimeno October 8, 2025 14:05

qued enabled auto-merge (squash) October 8, 2025 14:06

qued mentioned this pull request Oct 8, 2025

⚡️ Speed up function outputs_to_objects by 88% #440

Closed

Update changelog

8afb6ed

ajjimeno approved these changes Oct 8, 2025

View reviewed changes

qued merged commit 0c9a78b into main Oct 8, 2025
13 checks passed

qued deleted the codeflash/optimize-outputs_to_objects-metbo2xp branch October 8, 2025 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enhancement: optimize outputs to objects metbo2xp #443

enhancement: optimize outputs to objects metbo2xp #443

Uh oh!

qued commented Oct 7, 2025

Uh oh!

qued commented Oct 7, 2025

Uh oh!

claude bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

enhancement: optimize outputs to objects metbo2xp #443

enhancement: optimize outputs to objects metbo2xp #443

Uh oh!

Conversation

qued commented Oct 7, 2025

Uh oh!

qued commented Oct 7, 2025

Uh oh!

claude bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Progress ✅

Review Feedback

✅ Positive Changes

🔍 Minor Observations

✅ Quality Assurance

📋 Version Management

✅ Overall Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

claude bot commented Oct 8, 2025 •

edited

Loading