demo: Add an example Slang meshlet decoder by zeux · Pull Request #1018 · zeux/meshoptimizer

zeux · 2026-02-11T19:00:01Z

demo/meshletdec.slang can be used to decode meshlets encoded via meshopt_encodeMeshlet.

Each meshlet is decoded serially in a separate thread; this implementation approach works well for stream decode (when a large set of meshlets needs to be decoded). It's not a good fit for mesh shaders, unless multiple sub-meshlets are used and decoded on multiple threads. Note that the vertex decoding in particular can be performed using cooperative wave decoding; triangle decoding might be implementable in a similar fashion but it would require further research. In either case, this implementation doesn't pursue that, and is designed for applications that already have streaming decode that runs on the GPU, in case meshopt_decodeMeshlets is inconvenient to use.

The example implementation decodes each vertex/triangle into uint32; alternative engine-specific packing code should be easy to incorporate, modulo potential efficiency concerns about byte/unaligned writes. In that case it might also be worthwhile to decode meshlet data into shared memory and then write repacked data into global memory using optimally aligned transactions. All of this is left as an exercise to the reader :)

On NVIDIA GeForce RTX 5070, this implementation decodes 64/96 meshlets at 17-23B triangles/sec (equivalent to 120-150 GB/s of output data); it's approximately equivalent to ~16 Zen4 cores except that multi-core CPU decoding quickly hits memory bandwidth limit well below 100+GB/s on typical systems. This should run fine in async compute too if it's co-scheduled with ALU-intensive code, as the decoding is light on ALU and is mostly L2 cache access bound.

This contribution is sponsored by Valve.

This shader can be used to decode meshlets encoded via meshopt_encodeMeshlet. The implementation is not optimized yet and represents a line-by-line port of the scalar C++ decoding; it will be improved separately. Each meshlet is decoded serially in a separate thread. Vertex data is easy to decode using wave intrinsics; this will be done separately too.

The load mask is easy to construct dynamically based on the length. This is not faster but it aligns better with the GPU decoding method, and it's the only remaining function-local static array so we might as well remove it.

This change converts the CPU-friendly equivalent with a branch (plus a simplified version of our branchless CPU code that could generate branches depending on the support for predicated loads...) with a fully branchless decoder. While restarts usually just happen in the first triangle of a meshlet and as such the restart branch is coherent, occasionally meshlets have restarts in the middle of the sequence which is rarely aligned between different meshlets. A branchless sequence seems to never be a regression and usually results in ~15% better throughput on NV GPUs.

Similarly to the CPU SIMD decoding, we can decode triangles in pairs; we still use the same branchless scalar logic, but this allows us to read the code byte just once, which helps reduce inefficient cache traffic and improves performance further by up to 10%.

Reformatted meshletdec.slang using slangd and added minor clarifications to the code and the documentation.

Link the Slang example shader for better discovery.

zeux added 6 commits February 10, 2026 19:44

meshletcodec: Slightly simplify scalar decodeVertices

3cf6316

The load mask is easy to construct dynamically based on the length. This is not faster but it aligns better with the GPU decoding method, and it's the only remaining function-local static array so we might as well remove it.

demo: Fix comment formatting and adjust for clarity

ea777e5

Reformatted meshletdec.slang using slangd and added minor clarifications to the code and the documentation.

Update README.md

b2f2d86

Link the Slang example shader for better discovery.

zeux merged commit d033a68 into master Feb 12, 2026
13 checks passed

zeux deleted the mlc-slang branch February 12, 2026 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo: Add an example Slang meshlet decoder#1018

demo: Add an example Slang meshlet decoder#1018
zeux merged 6 commits intomasterfrom
mlc-slang

zeux commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zeux commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant