Rework internal navigation to prevent deadlocking #776

karlseguin · 2025-06-11T10:06:45Z

The mix of sync and async HTTP requests requires care to avoid deadlocks. Previously, it was possible for async requests to use up all available HTTP state objects duration a navigation flow (either directly, or via an internal redirect (e.g. click, submit, ...)). This would block the navigation, which, because everything is single thread, would block the I/O loop, resulting in a deadlock.

The correct solution seems to be to remove all synchronous I/O. And I tried to do that, but I ran into a wall with module-loading, which is initiated from V8. V8 says "give me the source for this module", and I don't see a great way to tell it: wait a bit.

So I went back to trying to make this work with the hybrid model, despite last weeks failures to get it to work. I changed two things:

1 - The http client will only directly initiate an async request if there's
at least 2 free state objects available (1 for the request, and leaving 1
free for any synchronous requests)

2 - Delayed navigation retries until there's at least 1 free http state object
available.

Commits from last week did help with this. First, we're now guaranteed to have a single sync-request at a time (previously, we could have had 2). Secondly, the async connection is now async end-to-end (previously, it could have blocked on an empty state pool).

We could probably make this a bit more obviously by reserving 1 state object for synchronous requests. But, since the long term solution is probably having no synchronous requests, I'm happy with anything that lets me move past this issue.

The mix of sync and async HTTP requests requires care to avoid deadlocks. Previously, it was possible for async requests to use up all available HTTP state objects duration a navigation flow (either directly, or via an internal redirect (e.g. click, submit, ...)). This would block the navigation, which, because everything is single thread, would block the I/O loop, resulting in a deadlock. The correct solution seems to be to remove all synchronous I/O. And I tried to do that, but I ran into a wall with module-loading, which is initiated from V8. V8 says "give me the source for this module", and I don't see a great way to tell it: wait a bit. So I went back to trying to make this work with the hybrid model, despite last weeks failures to get it to work. I changed two things: 1 - The http client will only directly initiate an async request if there's at least 2 free state objects available (1 for the request, and leaving 1 free for any synchronous requests) 2 - Delayed navigation retries until there's at least 1 free http state object available. Commits from last week did help with this. First, we're now guaranteed to have a single sync-request at a time (previously, we could have had 2). Secondly, the async connection is now async end-to-end (previously, it could have blocked on an empty state pool). We could probably make this a bit more obviously by reserving 1 state object for synchronous requests. But, since the long term solution is probably having no synchronous requests, I'm happy with anything that lets me move past this issue.

In #767 I tried to call loop.run from within a loop.run (spoiler, it didn't work), in order to make sure aborted connections were properly cleaned up before starting a new navigation. That resulted in having loop.run no longer wait for timeouts for fear of having to wait on a long timeout. The ended up breaking page.wait (used in the fetch command). This commit brings back the original behavior where loop.run() waits for all completions. Which is now safe to do since the nested loop.run() call has been removed.

Fix loop run (Page.wait)

karlseguin force-pushed the fix_internal_navigation_deadlocks branch from 91b1177 to ad92ae2 Compare June 11, 2025 10:09

karlseguin force-pushed the fix_internal_navigation_deadlocks branch from ad92ae2 to 97c769e Compare June 12, 2025 04:35

karlseguin mentioned this pull request Jun 12, 2025

Terminate execution on internal navigation #778

Merged

karlseguin and others added 2 commits June 12, 2025 23:01

Merge pull request #780 from lightpanda-io/fix_loop_run_wait

582894c

Fix loop run (Page.wait)

karlseguin merged commit e3afa29 into main Jun 13, 2025
11 checks passed

karlseguin deleted the fix_internal_navigation_deadlocks branch June 13, 2025 01:38

github-actions bot locked and limited conversation to collaborators Jun 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rework internal navigation to prevent deadlocking #776

Rework internal navigation to prevent deadlocking #776

Uh oh!

karlseguin commented Jun 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rework internal navigation to prevent deadlocking #776

Rework internal navigation to prevent deadlocking #776

Uh oh!

Conversation

karlseguin commented Jun 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants