Revert timeout handling in autotune to use maxtime blocking #719

ChrisRackauckas-Claude · 2025-08-11T20:10:55Z

Reverts the recent timeout handling to use a simpler maxtime-based blocking mechanism

- Remove async task creation and timedwait mechanism - Simply run solve() and measure elapsed time - If algorithm exceeds maxtime, block it for larger matrices - Store max allowed size per algorithm per eltype - Skip blocked algorithms for larger sizes with informative message - Keep NaN handling for timed out/blocked runs - Blocked list automatically resets when switching eltypes This approach is more stable than trying to kill running tasks, which can cause Julia to hang. Instead, we let slow algorithms complete but prevent them from running on larger matrices.

ChrisRackauckas-Claude · 2025-08-11T20:11:10Z

Detailed Explanation

This PR addresses the issues with the current timeout implementation by reverting to a simpler approach that doesn't try to interrupt running tasks.

What Changed

Before (current implementation):

Creates async tasks with channels for communication
Uses timedwait() to detect when tasks exceed maxtime
Attempts to abandon timed-out tasks (but can't actually kill them)
Tasks continue running in background, potentially causing issues

After (this PR):

Simply runs solve() and measures elapsed time
If time > maxtime, blocks algorithm for larger matrices
Maintains a dictionary: blocked_algorithms[eltype][algorithm_name] = max_allowed_size
Skips blocked algorithms with clear error messages

Key Implementation Details

Blocking Logic: When an algorithm exceeds maxtime at size N, it's blocked for all sizes > N
Per-eltype Tracking: Each element type has its own blocked list, naturally resetting when switching types
NaN Handling: Preserved - blocked/skipped runs return NaN
Progress Reporting: Updated to show "Exceeded maxtime" vs "Skipped" for clarity

Why This Approach is Better

Stability: No risk of Julia hanging from task interruption attempts
Simplicity: Removed ~40 lines of async/channel code
Predictability: Algorithms either run to completion or are skipped entirely
Efficiency: Prevents repeated testing of slow algorithms on larger sizes

The user experience remains the same - slow algorithms are prevented from wasting time on larger matrices - but the implementation is much more robust.

ChrisRackauckas-Claude · 2025-08-12T09:27:04Z

Fixed the blocking logic issue!

The problem was that the comparison was using > instead of >= when checking if an algorithm should be skipped. This meant that when an algorithm exceeded maxtime at size N, it would still run again at size N in future runs.

The fix:

Changed line 150 from if n > blocked_at_size to if n >= blocked_at_size
Updated variable name from max_allowed_size to blocked_at_size for clarity
Fixed comments to accurately describe the behavior

Testing:
Created unit tests that verify the blocking logic works correctly for:

Sizes smaller than the blocked size (should NOT skip) ✓
Size equal to the blocked size (should skip - this was the bug) ✓
Sizes larger than the blocked size (should skip) ✓

The fix ensures that once an algorithm exceeds maxtime at a given matrix size, it will be properly skipped for that size and all larger sizes in subsequent benchmarking runs.

ChrisRackauckas-Claude · 2025-08-12T09:35:01Z

Added informative skip message as requested!

Now when an algorithm is skipped due to exceeding maxtime on a previous size, it will display:

[ Info: ⏭️  Skipping GenericLUFactorization for 9000×9000 Float64 matrix (exceeded maxtime at size 9000)

This makes it much clearer during benchmarking when and why algorithms are being skipped.

Project.toml

ChrisRackauckas reviewed Aug 12, 2025

View reviewed changes

Project.toml Outdated Show resolved Hide resolved

Update Project.toml

f1b9c2f

ChrisRackauckas merged commit 3fd10c0 into SciML:main Aug 12, 2025
114 of 118 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Revert timeout handling in autotune to use maxtime blocking #719

Revert timeout handling in autotune to use maxtime blocking #719

Uh oh!

ChrisRackauckas-Claude commented Aug 11, 2025

Uh oh!

ChrisRackauckas-Claude commented Aug 11, 2025

Uh oh!

ChrisRackauckas-Claude commented Aug 12, 2025

Uh oh!

ChrisRackauckas-Claude commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Revert timeout handling in autotune to use maxtime blocking #719

Revert timeout handling in autotune to use maxtime blocking #719

Uh oh!

Conversation

ChrisRackauckas-Claude commented Aug 11, 2025

Uh oh!

ChrisRackauckas-Claude commented Aug 11, 2025

Detailed Explanation

What Changed

Key Implementation Details

Why This Approach is Better

Uh oh!

ChrisRackauckas-Claude commented Aug 12, 2025

Uh oh!

ChrisRackauckas-Claude commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants