Skip to content

Conversation

ChrisRackauckas-Claude
Copy link
Contributor

Problem

The autotune feature would hang indefinitely when trying to timeout slow algorithms. This was caused by the timeout implementation attempting to forcefully interrupt tasks using Base.throwto(), which doesn't reliably work with tasks that are in certain states (like sleeping or blocked I/O).

Solution

Replace the problematic timeout mechanism with a more robust approach:

  • Use Julia's built-in timedwait() function instead of manual polling
  • Avoid using Base.throwto() which can cause hangs
  • Let timed-out tasks continue running in the background rather than trying to forcefully kill them
  • Use Channels for clean communication between tasks
  • Properly close channels to prevent resource leaks

Changes

  • Modified benchmark_algorithms() in benchmarking.jl to use timedwait()
  • Removed the problematic task interruption logic
  • Added proper channel cleanup

Testing

Tested the new implementation with various timeout scenarios:

  • Tasks that complete before timeout ✓
  • Tasks that exceed timeout ✓
  • Edge cases with exact timing ✓

The autotune now properly handles timeouts without hanging, recording timed-out tests as NaN and continuing with the next algorithm.

Fixes the hanging issue reported after merging #716.

- Replace manual polling loop with Julia's built-in timedwait() function
- Avoid using Base.throwto() which can cause hangs with sleeping tasks
- Let timed-out tasks continue in background rather than trying to kill them
- Use Channels for clean communication between tasks
- Close channels properly to prevent resource leaks

This fixes the issue where autotune would hang indefinitely when trying
to interrupt tasks that exceeded the timeout limit.
@ChrisRackauckas ChrisRackauckas merged commit 47e7223 into SciML:main Aug 11, 2025
113 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants