Skip to content

Conversation

ChrisRackauckas-Claude
Copy link
Contributor

Summary

  • Adds maxtime parameter to control maximum time per algorithm test in autotuning
  • Prevents hanging on slow algorithms or large matrices
  • Records timed-out runs as NaN for better visibility

Changes

1. Added maxtime parameter to benchmark_algorithms()

  • Default value: 100 seconds
  • Implements timeout during accuracy check using async tasks
  • Records timed-out runs as NaN in the results
  • Warns users when algorithms timeout

2. Updated autotune_setup() function

  • Added maxtime as keyword argument with 100s default
  • Passes parameter through to benchmark_algorithms()
  • Shows maxtime setting in info messages

3. Updated documentation

  • Added parameter description to docstrings
  • Added new section in autotune.md explaining the parameter
  • Provided usage examples for different timeout scenarios

Key Features

  • Time limit for accuracy checks: Each algorithm's accuracy check now has a maximum time limit
  • Smart timeout handling: Detects insufficient time for benchmarking after warmup
  • NaN recording: Timed-out runs are recorded as NaN rather than failing entirely
  • Continues on timeout: Benchmark continues with next algorithm when one times out
  • User feedback: Warning messages indicate when algorithms timeout

Usage Examples

# Default: 100 seconds maximum per algorithm test
results = autotune_setup()

# Quick timeout for fast exploration
results = autotune_setup(maxtime = 10.0)

# Extended timeout for slow algorithms or large matrices  
results = autotune_setup(
    maxtime = 300.0,  # 5 minutes per test
    sizes = [:large, :big]
)

Test Plan

  • Verify maxtime parameter is accepted by autotune_setup()
  • Verify default value of 100.0 seconds works
  • Verify timed-out runs are recorded as NaN
  • Verify warning messages appear for timeouts
  • Verify benchmarking continues after timeout
  • Test with actual slow algorithms on CI

This change addresses the issue where autotuning could hang indefinitely on certain algorithm/matrix combinations, making the autotuning process more robust and user-friendly.

- Added maxtime parameter with 100s default to autotune_setup() and benchmark_algorithms()
- Implements timeout handling during accuracy checks and benchmarking
- Records timed out runs as NaN in results
- Updated docstrings and documentation to explain the new parameter
- Prevents hanging on slow algorithms or large matrices
- Use Channel-based communication between warmup and timer tasks
- Properly interrupt timed-out tasks with Base.throwto()
- Clean up timer task when warmup completes successfully
- Handle exceptions from warmup task properly
- Prevents resource leaks from hanging tasks
@ChrisRackauckas-Claude
Copy link
Contributor Author

Updated the implementation to properly kill tasks that exceed the timeout:

  • Now using Channel-based communication between the warmup task and timer task
  • Properly interrupts timed-out tasks using Base.throwto() to send an InterruptException
  • Cleans up the timer task when warmup completes successfully
  • Handles exceptions from the warmup task appropriately
  • This prevents resource leaks from tasks that hang indefinitely

The improved implementation ensures that long-running computations are actually stopped when they timeout, rather than continuing to run in the background.

- Filter out NaN values when computing mean, max, and std statistics
- Exclude NaN values from plots to avoid visualization errors
- Report number of timed-out tests in summary output
- Ensure categorize_results excludes NaN values when selecting best algorithms
- All aggregation functions now properly handle NaN values that indicate timeouts

This ensures the autotuning system works correctly even when some tests timeout,
which is expected behavior for large matrix sizes or slow algorithms.
@ChrisRackauckas-Claude
Copy link
Contributor Author

Added robust NaN handling throughout the analysis pipeline:

Changes to handle NaN values from timeouts:

  1. Statistical aggregations - All , , and calls now filter out NaN values:

    • In LinearSolveAutotune.jl display and summary functions
    • In categorize_results() for selecting best algorithms
    • In telemetry.jl for generating reports
  2. Plotting - Filter out NaN values before plotting to prevent visualization errors

  3. User feedback - Added reporting of timed-out tests:

    • Shows count of timed-out tests in @info messages
    • Displays timeout count in AutotuneResults summary output

This ensures the autotuning system works correctly even when some tests timeout, which is expected behavior for large matrix sizes or slow algorithms. The NaN values serve as clear indicators of timeouts while not breaking downstream analysis.

@ChrisRackauckas ChrisRackauckas merged commit a0f36af into SciML:main Aug 11, 2025
102 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants