Skip to content

Fix direct physics states observing outdated data when using separate physics thread#116897

Open
Sauroniux wants to merge 1 commit intogodotengine:masterfrom
Sauroniux:fix_113270
Open

Fix direct physics states observing outdated data when using separate physics thread#116897
Sauroniux wants to merge 1 commit intogodotengine:masterfrom
Sauroniux:fix_113270

Conversation

@Sauroniux
Copy link

Fixes #113270

I tried the reported issue on 4.4, and there I was getting raycast hits ~90% of the frames. On the latest versions, raycast misses 100% of the frames (except for the first frame).

I then tried bisecting it and arrived at pull request 109591.

After a bit more digging, the bug unfolds like this:

  • Physics iteration begins
  • Physics server is sync'ed and transforms are read back into nodes via flush_queries
  • During readback, the Area3D gets a notification that its global transform has changed, and it sets the new transform to the server.
    • The transform changes because the Area3D is parented to a RigidBody3D
  • One of the following then happens:
    1. Without threading, the new transform immediately propagates and updates the physics engine's query tree
    2. With threading, on latest versions, the command to update is queued up, but nothing happens as the server is paused
    3. With threading, on older versions, the command to update is queued up and starts executing at some point
  • The physics_process callback is invoked and a raycast is performed
    1. Without threading, the raycast succeeds as the query tree is up to date
    2. With threading, on latest versions, the raycast misses as the query tree is out of date
    3. With threading, on older versions, the raycast sometimes misses and sometimes hits, based on how fast the physics thread is

The proposed change flushes any pending commands when retrieving direct state access. This is the same behavior as calling any other server function that returns a value. With the change, spatial queries and other direct state functions are able to operate on latest data.

It's still possible to perform a spatial query on an outdated tree by:

  1. Getting a direct space state
  2. Setting some transform to the server
  3. Performing the spatial query

But this change at least covers the bug that occurs without using any low-level access.

Copy link
Contributor

@mihe mihe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also resolve #109213, which is essentially a duplicate of #113270, or the other way around I guess, but I did not realize this problem extended to things like child node transform propagation as well, which is much worse than just shape changes not being respected.

I mentioned a similar fix in #109213, but I must admit I'm hesitant to approve this, even if your change is arguably the correct and pragmatic thing to do here. The performance impact of this change can potentially be quite drastic, depending on the project.

Given how prevalent physics queries are in most games, and the fact that things like Viewport::_process_picking and NOTIFICATION_INTERNAL_PHYSICS_PROCESS runs right after _physics_process and physics_frame, you would pretty much be guaranteed to flush every single mutating PhysicsServer*D operation on the main thread now1, as opposed to the physics thread.

We would however still be running the actual simulation step on the physics thread at least, which I would imagine is the more costly part of the physics for most projects, so there would still be some benefit to running threaded I guess.

Footnotes

  1. Except for things like transform changes of kinematic bodies I guess, since those happen at the very end of the physics processing.

@Sauroniux
Copy link
Author

Sauroniux commented Mar 1, 2026

Good catch, I added the missing flushes to the PR.

Yeah, I agree that this is a potentially big performance hit to ensure correctness. I suppose an alternative to this change could be to expose a manual flush (that is not a random getter)? That way, a user could call it to flush manually performed modifications to the physics engine. For example, in Unity, after you move a transform, you have to call Physics.SyncTransforms if you want to immediately perform queries against up-to-date physics state.

A counterargument to that is that in issue #113270 the user isn't performing any manual changes themselves, the engine does. So it's probably not obvious that anything needs flushing.

@mihe
Copy link
Contributor

mihe commented Mar 1, 2026

Good catch, I added the missing flushes to the PR.

I just realized there was one more actually, PhysicsServer2DWrapMT::body_collide_shape. It looks like it's missing the Thread::is_main_thread() check as well, which might be good to add while you're at it.

For example, in Unity, after you move a transform, you have to call Physics.SyncTransforms if you want to immediately perform queries against up-to-date physics state.

Interesting. Godot does have Node3D.force_update_transform(), which serves a similar purpose, but on a per-object basis, where you also have to call it if you changed the transform and still expect to query against its new position. That doesn't have anything to do with threading though, but rather with transform propagation being deferred in general.

A counterargument to that is that in issue #113270 the user isn't performing any manual changes themselves, the engine does. So it's probably not obvious that anything needs flushing.

Yeah, agreed, I don't think exposing some sort of manual sync/flush is the play here, unfortunately.

If we really wanted to over-complicate things we could maybe have specific FUNC* macros in PhysicsServer*DWrapMT that designated whether a method would invalidate the queryable state of the physics world, and if any such commands are present in the queue when we do spatial queries we would pop the queue until no such commands are left.

I doubt it would make much of a difference though, since the likelihood of there not being a single transform change at the end of physics processing (when Godot's own spatial queries happen) is probably very small. It also makes performance quite volatile and hard(er) to reason about.

@Sauroniux
Copy link
Author

Fixed the PhysicsServer2DWrapMT::body_collide_shape as well. Sorry, I should have spotten it previously.

If we really wanted to over-complicate things we could maybe have specific FUNC* macros in PhysicsServer*DWrapMT that designated whether a method would invalidate the queryable state of the physics world, and if any such commands are present in the queue when we do spatial queries we would pop the queue until no such commands are left.

It's nice that the over-complication in this case can live in the PhysicsServer*DWrapMT itself, without affecting the server implementation. To be fully optimal we'd need to also peek at the arguments of some of the commands. E.g. body_set_state can be potentially irrelevant or very relevant to queries.

One thing that comes to mind to regain and potentially improve the query performance, would be some form of asynchronous batch queries.
I don't know about Jolt, but for example PhysX allows executing queries on multiple threads at the same time. So the batches could go wider than just the physics thread itself.

LocalVector<RayParameters> parameters;
LocalVector<RayResult> results;
RID command_handle = PhysicsServer3D::get_singleton()->raycast_command(space, parameters, results);

// Some other work happens...

PhysicsServer3D::get_singleton()->ensure_finished(command_handle);
// Process the results...

But then, the PhysicsServer3D implementation would need to know that something like area_set_transform needs to conclude any running queries.

@mihe
Copy link
Contributor

mihe commented Mar 2, 2026

It's nice that the over-complication in this case can live in the PhysicsServer*DWrapMT itself, without affecting the server implementation.

It would however necessitate changing command_queue_mt.h in some way, which would be a bummer, and potentially controversial. And again, I'm not convinced that the average gains from such a change would be worth the effort.

One thing that comes to mind to regain and potentially improve the query performance, would be some form of asynchronous batch queries.

The problem isn't so much that the queries are slower, but rather that they force a flush of the entire command queue. Even if we managed to perfectly batch every single query done across the entire frame into a single batch, you're still stuck waiting for that batch at the end of the physics processing, along with the entire command queue, on the main thread.


Anyway, I'll stew on this for a few more days, but I think we just need to accept this compromise for now.

In fact, I'm almost tempted to say we should just be honest with ourselves and disable/bypass the command queue entirely between PhysicsServer*D::sync and PhysicsServer*D::end_sync. For a good majority (?) of projects that would result in the exact same stuff happening on the main thread, but in their rightful place, instead of being flushed at some random physics query and potentially mislead people's profiling.

That would largely supersede this change though.

@Sauroniux
Copy link
Author

Even if we managed to perfectly batch every single query done across the entire frame into a single batch, you're still stuck waiting for that batch at the end of the physics processing, along with the entire command queue, on the main thread.

I think it could still bring a tangible speedup, but that's a big feature on it's own, not really related to fixing this bug.

I'm almost tempted to say we should just be honest with ourselves and disable/bypass the command queue entirely between PhysicsServerD::sync and PhysicsServerD::end_sync

So that would basically mean changing the ASYNC_COND_PUSH definition to do the same thing for "setters" that we do for all the "getters"? That way while the sync flag is raised, on the main thread we'd just access the server directly?

That would largely supersede this change though.

That's fine, the goal is to find the best possible solution.

@mihe
Copy link
Contributor

mihe commented Mar 3, 2026

So that would basically mean changing the ASYNC_COND_PUSH definition to do the same thing for "setters" that we do for all the "getters"? That way while the sync flag is raised, on the main thread we'd just access the server directly?

Yes, exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants