Add maxtime parameter to LinearSolveAutotune for timeout handling #716

ChrisRackauckas-Claude · 2025-08-11T15:27:21Z

Summary

Adds maxtime parameter to control maximum time per algorithm test in autotuning
Prevents hanging on slow algorithms or large matrices
Records timed-out runs as NaN for better visibility

Changes

1. Added `maxtime` parameter to `benchmark_algorithms()`

Default value: 100 seconds
Implements timeout during accuracy check using async tasks
Records timed-out runs as NaN in the results
Warns users when algorithms timeout

2. Updated `autotune_setup()` function

Added maxtime as keyword argument with 100s default
Passes parameter through to benchmark_algorithms()
Shows maxtime setting in info messages

3. Updated documentation

Added parameter description to docstrings
Added new section in autotune.md explaining the parameter
Provided usage examples for different timeout scenarios

Key Features

Time limit for accuracy checks: Each algorithm's accuracy check now has a maximum time limit
Smart timeout handling: Detects insufficient time for benchmarking after warmup
NaN recording: Timed-out runs are recorded as NaN rather than failing entirely
Continues on timeout: Benchmark continues with next algorithm when one times out
User feedback: Warning messages indicate when algorithms timeout

Usage Examples

# Default: 100 seconds maximum per algorithm test
results = autotune_setup()

# Quick timeout for fast exploration
results = autotune_setup(maxtime = 10.0)

# Extended timeout for slow algorithms or large matrices  
results = autotune_setup(
    maxtime = 300.0,  # 5 minutes per test
    sizes = [:large, :big]
)

Test Plan

Verify maxtime parameter is accepted by autotune_setup()
Verify default value of 100.0 seconds works
Verify timed-out runs are recorded as NaN
Verify warning messages appear for timeouts
Verify benchmarking continues after timeout
Test with actual slow algorithms on CI

This change addresses the issue where autotuning could hang indefinitely on certain algorithm/matrix combinations, making the autotuning process more robust and user-friendly.

- Added maxtime parameter with 100s default to autotune_setup() and benchmark_algorithms() - Implements timeout handling during accuracy checks and benchmarking - Records timed out runs as NaN in results - Updated docstrings and documentation to explain the new parameter - Prevents hanging on slow algorithms or large matrices

- Use Channel-based communication between warmup and timer tasks - Properly interrupt timed-out tasks with Base.throwto() - Clean up timer task when warmup completes successfully - Handle exceptions from warmup task properly - Prevents resource leaks from hanging tasks

ChrisRackauckas-Claude · 2025-08-11T15:30:40Z

Updated the implementation to properly kill tasks that exceed the timeout:

Now using Channel-based communication between the warmup task and timer task
Properly interrupts timed-out tasks using Base.throwto() to send an InterruptException
Cleans up the timer task when warmup completes successfully
Handles exceptions from the warmup task appropriately
This prevents resource leaks from tasks that hang indefinitely

The improved implementation ensures that long-running computations are actually stopped when they timeout, rather than continuing to run in the background.

lib/LinearSolveAutotune/src/benchmarking.jl

- Filter out NaN values when computing mean, max, and std statistics - Exclude NaN values from plots to avoid visualization errors - Report number of timed-out tests in summary output - Ensure categorize_results excludes NaN values when selecting best algorithms - All aggregation functions now properly handle NaN values that indicate timeouts This ensures the autotuning system works correctly even when some tests timeout, which is expected behavior for large matrix sizes or slow algorithms.

ChrisRackauckas-Claude · 2025-08-11T15:36:56Z

Added robust NaN handling throughout the analysis pipeline:

Changes to handle NaN values from timeouts:

Statistical aggregations - All , , and calls now filter out NaN values:
- In LinearSolveAutotune.jl display and summary functions
- In categorize_results() for selecting best algorithms
- In telemetry.jl for generating reports
Plotting - Filter out NaN values before plotting to prevent visualization errors
User feedback - Added reporting of timed-out tests:
- Shows count of timed-out tests in @info messages
- Displays timeout count in AutotuneResults summary output

This ensures the autotuning system works correctly even when some tests timeout, which is expected behavior for large matrix sizes or slow algorithms. The NaN values serve as clear indicators of timeouts while not breaking downstream analysis.

ChrisRackauckas added 2 commits August 11, 2025 11:17

ChrisRackauckas reviewed Aug 11, 2025

View reviewed changes

lib/LinearSolveAutotune/src/benchmarking.jl Outdated Show resolved Hide resolved

ChrisRackauckas added 2 commits August 11, 2025 11:33

Update lib/LinearSolveAutotune/src/benchmarking.jl

ef0db19

ChrisRackauckas merged commit a0f36af into SciML:main Aug 11, 2025
102 of 118 checks passed

This was referenced Aug 11, 2025

Fix hanging issue in autotune timeout mechanism #717

Merged

Improve timeout handling: skip timed-out algorithms and cap big benchmarks at 15000 #718

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add maxtime parameter to LinearSolveAutotune for timeout handling #716

Add maxtime parameter to LinearSolveAutotune for timeout handling #716

Uh oh!

ChrisRackauckas-Claude commented Aug 11, 2025

Uh oh!

ChrisRackauckas-Claude commented Aug 11, 2025

Uh oh!

Uh oh!

ChrisRackauckas-Claude commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add maxtime parameter to LinearSolveAutotune for timeout handling #716

Add maxtime parameter to LinearSolveAutotune for timeout handling #716

Uh oh!

Conversation

ChrisRackauckas-Claude commented Aug 11, 2025

Summary

Changes

1. Added maxtime parameter to benchmark_algorithms()

2. Updated autotune_setup() function

3. Updated documentation

Key Features

Usage Examples

Test Plan

Uh oh!

ChrisRackauckas-Claude commented Aug 11, 2025

Uh oh!

Uh oh!

ChrisRackauckas-Claude commented Aug 11, 2025

Changes to handle NaN values from timeouts:

Uh oh!

Uh oh!

Uh oh!

1. Added `maxtime` parameter to `benchmark_algorithms()`

2. Updated `autotune_setup()` function