Skip to content

Conversation

@Simn
Copy link
Member

@Simn Simn commented Feb 9, 2026

Here's my initial take on how to support this. I've opted to use a semaphore guarded by an atomic integer instead of a condition variable because I realized we would need a timed wait in order to have a basic "wait until the next event is due" functionality. This comes with the additional benefit of avoiding the signal-before-wait problem condition variables have. And another benefit is that the scheduling threads don't have to fight over a shared resource, they only do a single CAS in an attempt to update the atomic integer.

The loop function in ThreadAwareScheduler looks a little wonky now, but I think it does what I want it to do. I haven't actually checked if this does anything for the CPU on our threaded targets, for now I just want to get it to actually not hang everywhere.

And just as I open this, HL decided to hang on testAcquireConcurrency. That almost seems unrelated to what I'm doing here, but who knows...

Edit: JVM hangs too, so basically this doesn't actually work yet.

@Simn
Copy link
Member Author

Simn commented Feb 9, 2026

Something that's a little annoying about task.onCompletion is that it doesn't react very well to threads because it is executed after the state has been set to Completed or Cancelled, which is also what isActive checks, which is our loop condition. So in this scenario:

		task.onCompletion((_, _) -> loop.wakeUp());
		while (task.isActive()) {
			loop.loop();
		}

We can hit the onCompletion callback in one thread while the loop-thread completes its loop and falls through, which then might trigger the loop shutdown and whatnot while we're still busy handling our callback.

This actually explains some problems I've been seeing related to this. I've added a mutex to C++'s LuvScheduler to avoid issues with using handles from the callback after closing them from the loop-thread. But I wonder if there's a cleaner solution which avoids this situation entirely.

@Simn
Copy link
Member Author

Simn commented Feb 9, 2026

And somehow despite the mutex, there's still the closed async problem on C++:

Assertion failed: !(handle->flags & UV_HANDLE_CLOSING), file D:/a/hxcoro/hxcoro/.haxelib/hxcpp_luv_io/git/src/cpp/luv/../../../libuv-1.48.0/src/win/async.c, line 76

@Simn
Copy link
Member Author

Simn commented Feb 9, 2026

Something must be wrong with (how I'm using) LuvDispatcher because all this works much better with ThreadPoolDispatcher.

@Simn
Copy link
Member Author

Simn commented Feb 10, 2026

This works better now. The main problem was that LuvScheduler closed its workQueue before closing the scheduler, and the latter could still try to dispatch something, which doesn't go very well if the dispatcher is already closed.

I've been making a wrong assumption about when there can and cannot be events in the scheduler: just because a task is complete doesn't mean that the scheduler queue is necessarily empty. In particular, this can occur with cancellations where a scheduler handle remains in the scheduler's heap after closing it.

The new test everyone hates is Issue124. JVM just got stuck and C++ ran into the threadpool.c:359 assertion once again, so something is definitely still wrong. It's a little concerning that both of these are affected because they use completely different scheduling/dispatching, which could hint at a more central problem.

Of course I can run the JVM tests locally 500 times in a row without any problems. The joy of multi-threaded programming...

@Simn Simn marked this pull request as ready for review February 10, 2026 08:30
@skial skial mentioned this pull request Feb 10, 2026
1 task
@Simn
Copy link
Member Author

Simn commented Feb 10, 2026

This should be good now. C++ still randomly dies, but it does that on master too so that's not related. I don't love this IShutDown interface so it would be good to find a better solution for that, but it's not very important right now.

Testing with a simple delay(15000), I no longer see any CPU activity. @Aidan63 Please check if this addresses your original situation from #105!

@Aidan63
Copy link
Contributor

Aidan63 commented Feb 10, 2026

image

Yep, flat cpu usage now while waiting for stdin.

@Simn Simn merged commit b21dddd into master Feb 10, 2026
68 of 69 checks passed
@Simn Simn deleted the blocking-loop branch February 10, 2026 18:03
@Simn Simn mentioned this pull request Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants