You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fVDB operations currently exist only within the PyTorch ecosystem. Users who want to deploy fVDB-based models (neural radiance fields, 3D reconstruction, sparse convolution networks, etc.) in ONNX Runtime -- for cross-language inference (C++, C#, Java, JS), hardware-specific EPs, or standardized model interchange -- have no path to do so.
The core challenge is that ONNX graphs transport only tensors between nodes, while fVDB operations pass two non-tensor types: GridBatch (which wraps NanoVDB OnIndexGrid data in a contiguous byte buffer) and JaggedTensor (variable-length batched data). Grid topology is dynamic -- constructed at inference time from input data.
Design Overview
The approach is split into two phases: (1) a custom ONNX operator domain with tensor-based representations of fVDB types, and (2) an optional ONNX Runtime Execution Provider plugin for optimized execution.
Phase 1: Custom ONNX Operator Domain (fvdb)
1.1 Tensor Representations
GridBatch as a tensor bundle ("GridBatch Bundle")
A GridBatch is decomposed into a group of tensors that always travel together through the ONNX graph. This exploits the fact that NanoVDB grids are stored as a single contiguous byte buffer (see GridData::mGridSize in NanoVDB.h and TorchDeviceBuffer in src/fvdb/detail/TorchDeviceBuffer.h):
Tensor
Dtype
Shape
Source in GridBatchImpl
grid_blob
uint8
[N] (dynamic)
mGridHdl->data() -- the raw NanoVDB buffer containing all grids packed sequentially
grid_byte_offsets
int64
[B]
Per-grid mCumBytes from GridMetadata (src/fvdb/detail/GridBatchImpl.h ~L30-57)
voxel_sizes
float64
[B, 3]
Per-grid mVoxelSize from GridMetadata
origins
float64
[B, 3]
Per-grid voxel origins from GridMetadata
leaf_batch_indices
int32
[L] (dynamic)
mLeafBatchIndices (GridBatchImpl.h ~L93)
batch_offsets
int64
[B+1]
mBatchOffsets (GridBatchImpl.h ~L94)
list_indices
int32
[M] (dynamic)
mListIndices (GridBatchImpl.h ~L95)
The grid_blob contains the serialized NanoVDB tree structure. Individual grids within the batch are accessed via pointer arithmetic: reinterpret_cast<const nanovdb::OnIndexGrid*>(blob_ptr + grid_byte_offsets[bi]) (see GridBatchImpl::Accessor::grid() at GridBatchImpl.h ~L151-156).
Alignment requirement: NanoVDB requires 32-byte alignment (NanoVDB.h ~L67-78). CUDA allocators satisfy this (typically 256B+). CPU allocators may not -- the custom op implementations must validate or enforce alignment.
JaggedTensor as a tensor bundle ("JaggedTensor Bundle")
A JaggedTensor (src/fvdb/JaggedTensor.h ~L163-192) is decomposed into its constituent tensors:
Tensor
Dtype
Shape
Source
jdata
varies
[N, *esizes]
mData -- packed values
joffsets
int64
[T+1]
mOffsets -- CSR boundaries
jidx
int32
[N]
mBatchIdx -- per-element batch index
jlidx
int32
[T, ldim]
mListIdx -- list-of-lists indexing
1.2 Custom Operator Schemas
Define ONNX custom ops in the fvdb domain. Each op corresponds to an existing C++ function in src/fvdb/detail/ops/. The ops split into two categories:
Grid Construction Ops (produce a GridBatch Bundle from input tensors):
ONNX Custom Op
Underlying implementation
Notes
fvdb.GridFromPoints
BuildGridFromPoints.h
fvdb.GridFromIJK
BuildGridFromIjk.h
fvdb.GridFromMesh
BuildGridFromMesh.h
fvdb.GridFromDense
BuildDenseGrid.h
fvdb.GridFromNanoVDB
New -- lightweight init
For users who provide a pre-built NanoVDB blob as input; derives metadata + index tensors from the blob
Grid-Consuming Ops (take a GridBatch Bundle + data tensors, produce tensor/JaggedTensor Bundle outputs):
Priority ops for initial implementation (covers the most common inference patterns):
ONNX Custom Op
Underlying implementation
fvdb.SampleTrilinear
SampleGridTrilinear.h
fvdb.SampleBezier
SampleGridBezier.h
fvdb.SplatTrilinear
SplatIntoGridTrilinear.h
fvdb.SplatBezier
SplatIntoGridBezier.h
fvdb.PointsInGrid
PointsInGrid.h
fvdb.IJKToIndex
IjkToIndex.h
fvdb.CoordsInGrid
CoordsInGrid.h
fvdb.DownsampleAvgPool
DownsampleGridAvgPool.h
fvdb.DownsampleMaxPool
DownsampleGridMaxPool.h
Grid-to-Grid Ops (produce a new GridBatch Bundle):
ONNX Custom Op
Underlying implementation
fvdb.CoarsenedGrid
BuildCoarseGridFromFine.h
fvdb.DilatedGrid
BuildDilatedGrid.h
fvdb.ConvGrid
BuildGridForConv.h
fvdb.ConvTransposeGrid
BuildGridForConvTranspose.h
fvdb.PrunedGrid
BuildPrunedGrid.h
fvdb.MergedGrid
BuildMergedGrids.h
The full op list (~55 ops under src/fvdb/detail/ops/) need not all be implemented at once. The initial set above covers the most common inference workloads.
1.3 Custom Op Library Implementation
Build a shared library (libfvdb_onnx_ops.so / fvdb_onnx_ops.dll) that:
Each custom op is a thin adapter layer that reconstructs fVDB types from the tensor bundle, calls the existing C++ implementation (no kernel rewrite needed), and decomposes the results.
1.4 ONNX Model Export
Provide a Python utility to export fVDB-based PyTorch models to ONNX. Since fVDB ops are currently bound via pybind11 (src/python/Bindings.cpp, PYBIND11_MODULE at L96) rather than torch.library, torch.onnx.export() won't trace them automatically. Options:
Custom ONNX exporter with symbolic functions: Register torch.onnx symbolic handlers for each fVDB op that emit the corresponding fvdb.* custom ONNX nodes
Manual graph construction: Provide an fvdb.export_onnx(model, sample_inputs, path) utility that traces the model and builds the ONNX graph programmatically using onnx.helper
1.5 User-Provided Grids
Users who construct grids externally and pass them in for inference use the fvdb.GridFromNanoVDB op. They provide:
The raw NanoVDB bytes as a uint8 input tensor
Voxel sizes as a float64[B, 3] input tensor
Origins as a float64[B, 3] input tensor
The fvdb.GridFromNanoVDB op derives the remaining metadata (leaf batch indices, batch offsets, list indices) from the NanoVDB buffer at runtime. This mirrors the existing GridBatch constructor that accepts a GridHandle (GridBatch.h ~L30-32).
The EP controls its own memory allocation. Since fVDB already uses TorchDeviceBuffer (src/fvdb/detail/TorchDeviceBuffer.h) which is a raw uint8_t* + size + device, the EP can either:
Continue using TorchDeviceBuffer (requires linking libtorch)
Implement a lightweight buffer type that uses ORT's allocator APIs instead, decoupling from PyTorch at the inference layer
2.3 Benefits Over Phase 1 Alone
Eliminates per-op GridBatch reconstruction from the tensor bundle
Enables cross-op optimizations (e.g., fusing grid construction + immediate sample)
Grid state never materializes as ONNX tensor edges, reducing memory management overhead
Gives full control over CUDA stream and memory management within the fVDB subgraph
Implementation Plan
Phase 1 milestones:
Define the GridBatch Bundle and JaggedTensor Bundle tensor representations
Implement fvdb.GridFromPoints and fvdb.GridFromNanoVDB construction ops
Implement fvdb.SampleTrilinear and fvdb.PointsInGrid as initial consuming ops
Build and test the custom op shared library with ONNX Runtime (CPU + CUDA)
Implement Python export utility
Expand op coverage based on user demand
Phase 2 milestones:
Scaffold EP plugin with ORT EP ABI
Implement GetCapability to claim fvdb.* subgraphs
Implement native GridBatch state management within the EP
Register kernel implementations for all Phase 1 ops
Benchmark against Phase 1 (per-op reconstruction) to quantify improvement
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Motivation
fVDB operations currently exist only within the PyTorch ecosystem. Users who want to deploy fVDB-based models (neural radiance fields, 3D reconstruction, sparse convolution networks, etc.) in ONNX Runtime -- for cross-language inference (C++, C#, Java, JS), hardware-specific EPs, or standardized model interchange -- have no path to do so.
The core challenge is that ONNX graphs transport only tensors between nodes, while fVDB operations pass two non-tensor types:
GridBatch(which wraps NanoVDBOnIndexGriddata in a contiguous byte buffer) andJaggedTensor(variable-length batched data). Grid topology is dynamic -- constructed at inference time from input data.Design Overview
The approach is split into two phases: (1) a custom ONNX operator domain with tensor-based representations of fVDB types, and (2) an optional ONNX Runtime Execution Provider plugin for optimized execution.
Phase 1: Custom ONNX Operator Domain (
fvdb)1.1 Tensor Representations
GridBatch as a tensor bundle ("GridBatch Bundle")
A
GridBatchis decomposed into a group of tensors that always travel together through the ONNX graph. This exploits the fact that NanoVDB grids are stored as a single contiguous byte buffer (seeGridData::mGridSizeinNanoVDB.handTorchDeviceBufferinsrc/fvdb/detail/TorchDeviceBuffer.h):GridBatchImplgrid_blobuint8[N](dynamic)mGridHdl->data()-- the raw NanoVDB buffer containing all grids packed sequentiallygrid_byte_offsetsint64[B]mCumBytesfromGridMetadata(src/fvdb/detail/GridBatchImpl.h~L30-57)voxel_sizesfloat64[B, 3]mVoxelSizefromGridMetadataoriginsfloat64[B, 3]GridMetadataleaf_batch_indicesint32[L](dynamic)mLeafBatchIndices(GridBatchImpl.h~L93)batch_offsetsint64[B+1]mBatchOffsets(GridBatchImpl.h~L94)list_indicesint32[M](dynamic)mListIndices(GridBatchImpl.h~L95)The
grid_blobcontains the serialized NanoVDB tree structure. Individual grids within the batch are accessed via pointer arithmetic:reinterpret_cast<const nanovdb::OnIndexGrid*>(blob_ptr + grid_byte_offsets[bi])(seeGridBatchImpl::Accessor::grid()atGridBatchImpl.h~L151-156).Alignment requirement: NanoVDB requires 32-byte alignment (
NanoVDB.h~L67-78). CUDA allocators satisfy this (typically 256B+). CPU allocators may not -- the custom op implementations must validate or enforce alignment.JaggedTensor as a tensor bundle ("JaggedTensor Bundle")
A
JaggedTensor(src/fvdb/JaggedTensor.h~L163-192) is decomposed into its constituent tensors:jdata[N, *esizes]mData-- packed valuesjoffsetsint64[T+1]mOffsets-- CSR boundariesjidxint32[N]mBatchIdx-- per-element batch indexjlidxint32[T, ldim]mListIdx-- list-of-lists indexing1.2 Custom Operator Schemas
Define ONNX custom ops in the
fvdbdomain. Each op corresponds to an existing C++ function insrc/fvdb/detail/ops/. The ops split into two categories:Grid Construction Ops (produce a GridBatch Bundle from input tensors):
fvdb.GridFromPointsBuildGridFromPoints.hfvdb.GridFromIJKBuildGridFromIjk.hfvdb.GridFromMeshBuildGridFromMesh.hfvdb.GridFromDenseBuildDenseGrid.hfvdb.GridFromNanoVDBGrid-Consuming Ops (take a GridBatch Bundle + data tensors, produce tensor/JaggedTensor Bundle outputs):
Priority ops for initial implementation (covers the most common inference patterns):
fvdb.SampleTrilinearSampleGridTrilinear.hfvdb.SampleBezierSampleGridBezier.hfvdb.SplatTrilinearSplatIntoGridTrilinear.hfvdb.SplatBezierSplatIntoGridBezier.hfvdb.PointsInGridPointsInGrid.hfvdb.IJKToIndexIjkToIndex.hfvdb.CoordsInGridCoordsInGrid.hfvdb.DownsampleAvgPoolDownsampleGridAvgPool.hfvdb.DownsampleMaxPoolDownsampleGridMaxPool.hGrid-to-Grid Ops (produce a new GridBatch Bundle):
fvdb.CoarsenedGridBuildCoarseGridFromFine.hfvdb.DilatedGridBuildDilatedGrid.hfvdb.ConvGridBuildGridForConv.hfvdb.ConvTransposeGridBuildGridForConvTranspose.hfvdb.PrunedGridBuildPrunedGrid.hfvdb.MergedGridBuildMergedGrids.hThe full op list (~55 ops under
src/fvdb/detail/ops/) need not all be implemented at once. The initial set above covers the most common inference workloads.1.3 Custom Op Library Implementation
Build a shared library (
libfvdb_onnx_ops.so/fvdb_onnx_ops.dll) that:RegisterCustomOpsper the ONNX Runtime custom op library conventionfvdbdomain using theOrt::CustomOpDomain+Ort::Custom::CreateLiteCustomOpAPIOrt::Custom::CudaContextfor GPU ops:Each custom op is a thin adapter layer that reconstructs fVDB types from the tensor bundle, calls the existing C++ implementation (no kernel rewrite needed), and decomposes the results.
1.4 ONNX Model Export
Provide a Python utility to export fVDB-based PyTorch models to ONNX. Since fVDB ops are currently bound via pybind11 (
src/python/Bindings.cpp,PYBIND11_MODULEat L96) rather thantorch.library,torch.onnx.export()won't trace them automatically. Options:torch.onnxsymbolic handlers for each fVDB op that emit the correspondingfvdb.*custom ONNX nodesfvdb.export_onnx(model, sample_inputs, path)utility that traces the model and builds the ONNX graph programmatically usingonnx.helper1.5 User-Provided Grids
Users who construct grids externally and pass them in for inference use the
fvdb.GridFromNanoVDBop. They provide:uint8input tensorfloat64[B, 3]input tensorfloat64[B, 3]input tensorThe
fvdb.GridFromNanoVDBop derives the remaining metadata (leaf batch indices, batch offsets, list indices) from the NanoVDB buffer at runtime. This mirrors the existingGridBatchconstructor that accepts aGridHandle(GridBatch.h~L30-32).Phase 2: fVDB Execution Provider Plugin
Build an ONNX Runtime EP plugin (
libfvdb_ep.so) using the EP ABI kernel-based plugin API introduced in ORT v1.24.2.1 Architecture
The EP claims subgraphs of
fvdb.*custom ops viaOrtEp::GetCapability+EpGraphSupportInfo_LookUpKernel. Inside the EP:GridBatchobjects are maintained as native C++ state, not reconstructed from tensor bundles per-opOrtKernelRegistry+KernelRegistry_AddKernel) contains entries for each fVDB opGridBatchobjects and store them in EP-managed stateGridBatchdirectly, eliminating the per-op blob interpretation overheadThis is architecturally similar to how TensorRT EP captures subgraphs and executes them with an opaque internal engine. The relevant ORT APIs:
OrtEp::GetCapabilityOrtEp::GetKernelRegistryOrtKernelImpl::ComputeOrtEpApi::KernelRegistry_AddKernelOrtEpApi::GetEnvConfigEntries2.2 Memory Management
The EP controls its own memory allocation. Since fVDB already uses
TorchDeviceBuffer(src/fvdb/detail/TorchDeviceBuffer.h) which is a rawuint8_t*+ size + device, the EP can either:TorchDeviceBuffer(requires linking libtorch)2.3 Benefits Over Phase 1 Alone
Implementation Plan
Phase 1 milestones:
fvdb.GridFromPointsandfvdb.GridFromNanoVDBconstruction opsfvdb.SampleTrilinearandfvdb.PointsInGridas initial consuming opsPhase 2 milestones:
GetCapabilityto claimfvdb.*subgraphsGridBatchstate management within the EPKey References
OrtKernelImpl,OrtKernelRegistry,KernelRegistry_AddKernel,EpGraphSupportInfo_LookUpKernelOrtEp::GetCapability, Graph IR APIs, plugin EP infrastructureOrtApi::CreateEnvWithOptions,OrtEpApi::GetEnvConfigEntries, EP Plugin API overviewBeta Was this translation helpful? Give feedback.
All reactions