Skip to content

Conversation

ChrisRackauckas-Claude
Copy link
Contributor

Summary

This PR improves the efficiency of the autotuning process by:

  1. Automatically skipping algorithms that have timed out for all subsequent (larger) matrix sizes
  2. Capping the :big benchmark category at 15000×15000 matrices instead of 20000×20000

Changes

1. Smart Algorithm Exclusion

  • When an algorithm times out on a matrix size, it's automatically excluded from all larger sizes
  • Tracked per element type (Float64, ComplexF64, etc.)
  • Prevents wasting time repeatedly testing slow algorithms
  • Records skipped tests with message "Skipped: timed out on smaller matrix"

2. Reduced Maximum Matrix Size

  • Changed :big category from vcat(1000:2000:10000, 10000:5000:20000) to vcat(1000:2000:10000, 10000:5000:15000)
  • 20000×20000 matrices (3.2GB for Float64) often cause issues even on powerful computers
  • 15000×15000 (1.8GB for Float64) is more reasonable while still testing large-scale performance

3. Improved Reporting

  • Reports number of algorithms that timed out
  • Reports number of algorithms skipped due to previous timeouts
  • Helps users understand what happened during benchmarking

Benefits

  • Faster benchmarking: No time wasted on algorithms that are already too slow
  • More stable: Avoids memory issues with 20000×20000 matrices
  • Better UX: Clear reporting of what was skipped and why

Example Output

[ Info: 2 tests timed out (exceeded 100.0s limit)
[ Info: 2 algorithms skipped for larger matrices after timing out

Testing

The improved logic ensures that:

  • Algorithms timeout once and are then skipped
  • Progress bar still updates correctly for skipped tests
  • Results are properly recorded as NaN with appropriate error messages

This builds on #716 and #717 to make the autotuning system more robust and efficient.

- Replace manual polling loop with Julia's built-in timedwait() function
- Avoid using Base.throwto() which can cause hangs with sleeping tasks
- Let timed-out tasks continue in background rather than trying to kill them
- Use Channels for clean communication between tasks
- Close channels properly to prevent resource leaks

This fixes the issue where autotune would hang indefinitely when trying
to interrupt tasks that exceeded the timeout limit.
…marks

- Algorithms that timeout are automatically excluded from larger matrix sizes
- This prevents wasting time on algorithms that are already too slow
- Cap :big benchmark sizes at 15000 instead of 20000 for stability
- 20000x20000 matrices often cause issues even on powerful computers
- Add tracking of timed-out algorithms per element type
- Report number of skipped algorithms in summary output
- Update documentation to reflect these improvements

This makes the autotuning process much more efficient by not repeatedly
testing algorithms that have already proven to be too slow.
@ChrisRackauckas ChrisRackauckas merged commit 4b2d3d8 into SciML:main Aug 11, 2025
114 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants