Conversation
Further changes to clusterizer require analysis for the "stock" raster cluster configuration as well.
We need a specific point to anchor the overall traversal to, to ensure the traversal stays local and expands in all sides equally. Currently we start at an origin in mesh local space and do not control the flow; it doesn't matter as much *where* we start, but the origin may be embedded inside the mesh which makes the order less predictable.
To maintain meshlet global flow, we need to start new meshlets in the order that prioritizes smaller distances to the corner. However, to maintain local flow we still have to minimize the live_triangle based scoring we use right now. To avoid high algorithmic complexity we need to maintain a list of triangles of a limited size that we will use to evaluate these; for now, just add the triangle that's closest to the corner to the list.
When the next triangle does not fit into the meshlet, we analyze the neighbors of each vertex and append the best triangles to the seed list. This process is not precise: we do not filter out duplicate neighbors between different vertices, and we do not sort the replaced triangles perfectly, just taking the first available slot instead.
Whenever the best triangle doesn't fit into the current meshlet we now prune the seed list and select the best seed triangle accounrding to the live+distance metric; this replaces the logic for using the meshlet neighbors. That helps ensure the global flow of meshlets remains simultaneously optimal from the liveness perspective and clustered spatially, which reduces the chance of split meshlets down the line. The integration is currently partial, as the algorithm structure doesn't lend itself for a single natural place to add this logic; notably, if a meshlet is split, we actually don't use it for seeding and don't correctly select the starting triangle for the next meshlet, which will be addressed in the future.
For now we aren't filtering the seeds from the meshlet precisely, and we may get duplicates: we select one neighbor triangle per vertex which means we will likely see the same triangle as a neighbor of another vertex in the same meshlet. Using non-strict comparison for replacement mitigates this issue somewhat, as it allows replacing the triangle with itself instead of forcing the triangle to occupy another slot. This improves the flow by giving more options to choose from during selection later.
Instead of using the seeds when the adjacency selection didn't yield a result, we use the seeds when the meshlet is going to be split; this covers some cases when adjacency runs out of triangles, and also covers split_factor based splits, slightly improving the flow for flex variant as well.
The liveness based scoring is now part of the seed selection, so we no longer need special cases in getNeighborTriangle.
When the seed list is at capacity, instead of discarding the seeds from the new meshlet we now replace the last few seeds. This seems to slightly improve the flow for larger meshes, effectively "compacting" the list a little at every iteration.
Owner
Author
|
For future me: I've experimented with more precise replacement criteria, including 506f03c and a fully precise 4 element top-n; unfortunately, they are all improving the results only ever-so-slightly in aggregate, and making results worse on a couple meshes where it would be nice to not regress. This might be due to some properties of the metric that I don't fully understand, or due to the metric just being suboptimal; it's possible that a better global metric will be able to take better advantage of this, but for now it's probably best to leave this as is even though the replacement is somewhat ad-hoc in certain cases. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Up until now, the clusterizer made all decisions about how to continue and restart the meshlets based on local information - either current meshlet (for continuation) or previous meshlet (for restart). On large meshes, this
often resulted in a meandering traversal that left gaps in the mesh. These gaps would need to be filled later;
because the gaps had uneven sizes, this could result in disconnected clusters.
This change introduces global flow: the starting triangle for meshlets is now selected to prioritize a particular global traversal of the mesh. This is based on sorting by distance to a specific anchor point (arbitrarily chosen to be the negative corner of the bounding box), as well as by the sum of live counts which was the restart metric before this change. Both are important: distance sorting results in forcing meshlets to cover gaps in the mesh earlier which reduces the chance of disconnects, while live sorting results in a much cleaner meshlet fill locally.
Because of the live sorting, we can't easily use the KD tree (also, since we don't remove nodes from a KD tree, quering the same point will become progressively slower as more and more triangles would need to be skipped). This technically turns the sort above in a O(N) operation; if done before every meshlet, the entire process becomes O(N^2) and unusably slow for large meshes.
Instead, we maintain a set of triangle seeds of a limited size, and add a few seeds after finishing every meshlet, minimizing the metric above. Some corners are cut for performance, such as just selecting a single neighbor triangle per vertex and using a simpler replacement logic. The set is re-scored when starting every meshlet; this needs to be done for live triangles as we don't maintain that metric per triangle in an incremental fashion; however, since the set is of a limited size, the entire process stays linear and the performance degradation for meshlet generation is minimal (<1%).
This results in a significant improvement in cluster disconnections in various meshlet configurations (note, while 1.0 is the theoretical optimum, on a few of these meshes the mesh has many small disjoint features which makes 1.0 impossible to ever reach):
Reducing cluster splits also results in a small reduction in boundary size (which slightly improves vertex sharing and reduces locked edges in clustered simplification) and occasionally a small reduction in overall meshlet count. Testing the rasterization performance on geometry dense scenes with various cluster culling optimizations yields a small runtime speedup (3-5% depending on the mesh and GPU, AMD/NV were tested). Raytracing performance seems to be affected to a smaller degree, because the test harness uses more aggressive "flex" setup than the chart above, but the overall number of meshlets is also slightly reduced because fewer of them need to be split.
This optimization interplays well with the previous optimization (#794); so both local and global criteria are crucial to get this right, and future improvements might be possible in both. "prior" is the behavior as of the previous meshoptimizer release (v0.21):
This contribution is sponsored by Valve.