Skip to content

Conversation

@agargaro
Copy link
Contributor

@agargaro agargaro commented Aug 27, 2024

By not using the shapecast function, there is a slight improvement of about 25%.

I have added a page for benchmarks to be removed later called 'testToRemove.html' (I know, I have a big imagination).

I will also add the function using the sorted list shortly (#695).

Edit: I finished 3 first versions of closestPointToPoint.

@agargaro agargaro changed the title Improve ClosestPointToPoint Improve ClosestPointToPoint Aug 27, 2024
Copy link
Owner

@gkjohnson gkjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! Looking at the PQP_Distance function it looks like they use a similar approach to the hybrid method in that it starts a new sorted queue once the last one becomes full. I assume this has the benefit of limiting memory usage and preventing insertion time from becoming too long. I wonder if we'd see benefits from that solution here?

See the DistanceQueueRecurse function and this condition where it recurses and creates a new queue to traverse. Here is documentation from header file describing "qsize" use. Granted the function is a bit different because it's performing geometry-to-geometry so the queue is storing two nodes and a distance but fundamentally it's very similar.

if ( leftDistance < closestDistanceSq && leftDistance < maxThresholdSq ) {

if ( _closestPointToPoint( leftIndex ) ) return true;
if ( rightDistance < closestDistanceSq ) return _closestPointToPoint( rightIndex );
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to check distance to maxThresholdSq here, as well?

@@ -0,0 +1,20 @@
export function closestDistanceSquaredPointToBox( nodeIndex32, array, point ) {
Copy link
Owner

@gkjohnson gkjohnson Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how much of the 25% improvement we would see if we used this function instead of three.js' built in version in the original "shapecast" function (see #656) - similar to the box / ray test where avoiding reading the bounds to a Box3 object gave a significant speed improvement.

Comment on lines 72 to 80
if ( count >= sortedListMaxCount ) {

if ( _closestPointToPoint( nodeIndex32 ) ) return;

continue;

}

count ++;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the logic and rationale of the hybrid function a bit? It looks like it will choose the "best" node from the first 16 nodes (by default) and then fall back to the basic approach of traversing to the leaves for each of those nodes in order. The count never goes down, right?

@agargaro
Copy link
Contributor Author

agargaro commented Sep 1, 2024

Regarding the use of sorted queue, I would like to do more tests to understand when it is necessary.
Using dynamic bvh I noticed that it seems useful when there are a lot of nodes.

I would also like to modify the current shapecast' function by removing the arrayToBox' conversion to understand if the performance improvement is due to that.

Edit: sorry I read later that you also recommended doing to this. I'll try these days :)

@agargaro agargaro marked this pull request as draft September 1, 2024 18:10
@agargaro
Copy link
Contributor Author

agargaro commented Sep 1, 2024

This is with the actual 'shapecast'

image

and this is with edited 'shapecast' (removing arrayToBox)

image

@agargaro
Copy link
Contributor Author

agargaro commented Sep 1, 2024

Unfortunately, modifying the shapecast function is quite a delicate process now, because box3 is also used by the shapecast caller, so this would be a breaking change.

I believe it is correct to expose a box3 to the API because it makes it much easier for the user to use.

Could we make an alternative method for internal use, which instead of passing the box3, passes the nodeIndex32 and the float32array?

Or could we stop using shapecast for internal methods (as I am doing in this PR)?

Or leave it as is.

The overhead of shapecast is mainly callbacks and superfluous data that is passed as a parameter but not used by the caller (like isLeaf here).

image

@gkjohnson
Copy link
Owner

Thanks for checking!

and this is with edited 'shapecast' (removing arrayToBox)
...
Could we make an alternative method for internal use, which instead of passing the box3, passes the nodeIndex32 and the float32array?

I was mostly curious as to how much of the improvement was coming from this conversion to a Box3. It's worth documenting in #656, I think, but given that it's a breaking change lets not make it for now.

If we have a method that would be frequently used and benefit from the performance boost (like we're doing here) then I think it makes sense. Shapecast is just easier to implement initially.

I'm curious about #699 (comment) as well as how the PQP_Distance approach compares to this hybrid approach:

Looking at the PQP_Distance function it looks like they use a similar approach to the hybrid method...

@agargaro
Copy link
Contributor Author

agargaro commented Oct 19, 2024

@gkjohnson Sorry for the delay...

Brief recap:

The current algorithm recursively checks the child node with a shorter distance first and then the second.

What can be improved?
If the solution is in the second node to be checked, all nodes in the first child must still be checked.

Solution:
Use a sorted list to always check the nodes with a smaller distance first (non-recursive).


Using a sortedList, I noticed that on average 30% fewer nodes are checked (using the PR geometry), but the algorithm is slower.
This is because using a sorted list introduces overhead due to the addition of nodes (especially those with a worse score, to be placed at the top of the array, causing all other nodes to be shifted).

What can be improved?
We can handle the insertion of the worst nodes (those with a greater distance) differently.

Solution 1 (Hybrid Approach):
Use the sorted list only to check the first N nodes, after which continue with the current recursive algorithm.
This makes it less likely to recursively check all unnecessary nodes closer to the BVH root and therefore more expensive.

Solution 2 (Hybrid Approach 2):
Create a sorted list using all nodes of depth N, then continue with the current recursive algorithm.

This approach might be better than the previous one, because all nodes in the list would have precisely the same depth in the tree.

Solution 3 (Two separated lists):
Set a maximum length for the sorted list. Add all nodes with a low priority to be checked (those with a longer distance) in a separate unsorted list (faster)


This logic could also be applied for other algorithms that traverse via score

What do you think? :)

@gkjohnson
Copy link
Owner

Thanks for the summary! Generally this seems like a good improvement. Have you tried "solution 3", as well? Was Solution 2 still best? 3 sounds closest to what the PQP_Distance function implemented.

This logic could also be applied for other algorithms that traverse via score

I know it will still add some overhead but I'm wondering if this would be best added for the shapecast function so the faster algorithm is used for other queries, as well. This should just sort based on the boundsTraverseOrder "score" function. Then maybe in the future we can get rid of the implicit "arrayToBox" function for a further boost.

What do you think?

@agargaro
Copy link
Contributor Author

Thanks for the summary! Generally this seems like a good improvement. Have you tried "solution 3", as well? Was Solution 2 still best? 3 sounds closest to what the PQP_Distance function implemented.

Actually I only tried the first method. I just committed the second one.
Between the first and second methods the improvement is almost not noticeable but at least is cleaner.
I should try also with different models.

I know it will still add some overhead but I'm wondering if this would be best added for the shapecast function so the faster algorithm is used for other queries, as well. This should just sort based on the boundsTraverseOrder "score" function. Then maybe in the future we can get rid of the implicit "arrayToBox" function for a further boost.

What do you think?

Sure, as you wish 😄 But I will finish trying the third method first.

@agargaro
Copy link
Contributor Author

agargaro commented Jul 7, 2025

Hi! Here I am again after a long break 😄

Small recap:

I replaced the sortedList with a minHeap, which is generally the most appropriate data structure for this use case.

The minHeap has a fixed maximum capacity. Once this limit is reached, there are two possible strategies—both inspired by the PQP library:

  • Fall back to the classic recursive traversal algorithm.
  • Recursively use a new minHeap instance (smaller, so add/pop operations are cheaper), just like PQP does.

Both approaches yield comparable results, with performance depending on the model used and the spatial distribution of points.

Interestingly, the minHeap-based algorithm shows noticeable speed improvements only when the target point lies inside the bounding box of the object. Outside of it, the traditional recursive approach—with early pruning based on the minimum possible distance—is actually more efficient.

This leads to a possible optimization: we could dynamically select the most suitable algorithm depending on whether the query point lies within the bounding box.

In my dynamic BVH implementation, the minHeap significantly speeds up the process of finding the best location to insert a new node. This is because, in practice, most new instances fall within the root bounding box of their parent node.

Over the next few days, I plan to continue refining the shapecast function to see if similar improvements can be made in other parts of the traversal logic.

@agargaro
Copy link
Contributor Author

agargaro commented Jul 9, 2025

I did several tests on the shapecast method.
The overhead caused by the various callbacks and getting some extra parameters is about 5% (I created a test shapecast2 class).
The arrayToBox conversion on the other hand makes it 30-35% slower, as we thought.

Can we consider creating the shapecast methods without conversion and shapecastByScore (where we use minHeap), just for internal use?

@gkjohnson
Copy link
Owner

Thanks for taking a look at this and getting some more concrete numbers.

The overhead caused by the various callbacks and getting some extra parameters is about 5% (I created a test shapecast2 class).
The arrayToBox conversion on the other hand makes it 30-35% slower, as we thought.

I think 5% is an okay cost to pay for the deduplicated code and flexibility of the callback system. 30-35% is a nice savings, though. It looks like you're suggesting changing to the following arguments:

nodeScoreFunc( nodeIndex, fullFloat32Array );

Lets implement that and get the PR cleaned up and I can add a flag to make sure the shapecast function is backwards compatible for release.

After that I'm wondering if we do something like the following will we lose any significant performance gains? This would let us pass the 6-float buffer into the function instead of the whole bvh array so usage would be more clear for end users:

const bb = new Float32Array( fullFloat32Array.buffer, BOUNDING_DATA_INDEX( c1 ) * 4, 6 );
const score = nodeScoreFunc( nodeIndex, bb );

// or

const _scratch = new Float32Array( 6 );
// ...
const bbStart = BOUNDING_DATA_INDEX( c1 );
for ( let i = 0; i < 6; i ++ ) _scratch[ i ] = fullFloat32Array[ i + bbStart ];
const score = nodeScoreFunc( nodeIndex, bb );

Once the PR is ready I can help test the performance differences, as well. Thanks again for your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants