Improve timeout handling: skip timed-out algorithms and cap big benchmarks at 15000 #718

ChrisRackauckas-Claude · 2025-08-11T17:19:32Z

Summary

This PR improves the efficiency of the autotuning process by:

Automatically skipping algorithms that have timed out for all subsequent (larger) matrix sizes
Capping the :big benchmark category at 15000×15000 matrices instead of 20000×20000

Changes

1. Smart Algorithm Exclusion

When an algorithm times out on a matrix size, it's automatically excluded from all larger sizes
Tracked per element type (Float64, ComplexF64, etc.)
Prevents wasting time repeatedly testing slow algorithms
Records skipped tests with message "Skipped: timed out on smaller matrix"

2. Reduced Maximum Matrix Size

Changed :big category from vcat(1000:2000:10000, 10000:5000:20000) to vcat(1000:2000:10000, 10000:5000:15000)
20000×20000 matrices (3.2GB for Float64) often cause issues even on powerful computers
15000×15000 (1.8GB for Float64) is more reasonable while still testing large-scale performance

3. Improved Reporting

Reports number of algorithms that timed out
Reports number of algorithms skipped due to previous timeouts
Helps users understand what happened during benchmarking

Benefits

Faster benchmarking: No time wasted on algorithms that are already too slow
More stable: Avoids memory issues with 20000×20000 matrices
Better UX: Clear reporting of what was skipped and why

Example Output

[ Info: 2 tests timed out (exceeded 100.0s limit)
[ Info: 2 algorithms skipped for larger matrices after timing out

Testing

The improved logic ensures that:

Algorithms timeout once and are then skipped
Progress bar still updates correctly for skipped tests
Results are properly recorded as NaN with appropriate error messages

This builds on #716 and #717 to make the autotuning system more robust and efficient.

- Replace manual polling loop with Julia's built-in timedwait() function - Avoid using Base.throwto() which can cause hangs with sleeping tasks - Let timed-out tasks continue in background rather than trying to kill them - Use Channels for clean communication between tasks - Close channels properly to prevent resource leaks This fixes the issue where autotune would hang indefinitely when trying to interrupt tasks that exceeded the timeout limit.

…marks - Algorithms that timeout are automatically excluded from larger matrix sizes - This prevents wasting time on algorithms that are already too slow - Cap :big benchmark sizes at 15000 instead of 20000 for stability - 20000x20000 matrices often cause issues even on powerful computers - Add tracking of timed-out algorithms per element type - Report number of skipped algorithms in summary output - Update documentation to reflect these improvements This makes the autotuning process much more efficient by not repeatedly testing algorithms that have already proven to be too slow.

ChrisRackauckas added 2 commits August 11, 2025 12:02

ChrisRackauckas merged commit 4b2d3d8 into SciML:main Aug 11, 2025
114 of 118 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve timeout handling: skip timed-out algorithms and cap big benchmarks at 15000 #718

Improve timeout handling: skip timed-out algorithms and cap big benchmarks at 15000 #718

Uh oh!

ChrisRackauckas-Claude commented Aug 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Improve timeout handling: skip timed-out algorithms and cap big benchmarks at 15000 #718

Improve timeout handling: skip timed-out algorithms and cap big benchmarks at 15000 #718

Uh oh!

Conversation

ChrisRackauckas-Claude commented Aug 11, 2025

Summary

Changes

1. Smart Algorithm Exclusion

2. Reduced Maximum Matrix Size

3. Improved Reporting

Benefits

Example Output

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants