MojoSplat is an experimental port of Gaussian Splatting kernels to Mojo, exploring the potential performance and multi-vendor support of Mojo for GPU acceleration.
orbit.mp4
This project implements the three core kernels of 3D Gaussian Splatting:
- Projection: Transform 3D Gaussians to 2D image space
- Binning: Sort and assign Gaussians to screen tiles
- Rasterization: Render Gaussians to pixels with alpha blending
You can call the render function or any of the individual kernels directly from python (using pytorch). The mojo kernels will be compiled on the fly.
| Kernel | PyTorch | GSplat | Mojo |
|---|---|---|---|
| Projection | ✅ | ✅ | ✅ |
| Binning | ✅ | ✅ | ✅ |
| Rasterization | ❌* | ✅ | ✅ |
*PyTorch rasterization falls back to GSplat implementation
Warning
- This is NOT production ready.
- Performance is inferior to the GSplat CUDA version. But this gap is closing as the Mojo language and library mature. Performance improvements are welcome!
- Mojo is evolving very fast. Faster than I work on this (this is very much a side project). So thsi projects will likely not be up to date with latest Mojo all the time as each update requires a non insignificant amount of work.
For development or standalone usage, this project uses uv for dependency management:
# Clone the repository
git clone https://github.com/bertaveira/mojosplat.git
cd mojosplat
# Install dependencies and activate environment
uv syncpip install git+https://github.com/bertaveira/mojosplat.gitAdd to your pyproject.toml:
dependencies = [
"mojosplat @ git+https://github.com/bertaveira/mojosplat.git",
# ... your other dependencies
]Add to your requirements.txt:
git+https://github.com/bertaveira/mojosplat.git
dependencies:
- pip
- pip:
- git+https://github.com/bertaveira/mojosplat.gitAll inputs must be CUDA tensors (float32). scales are in log-space; quats are (w, x, y, z); opacities shape is (N,).
import torch
from mojosplat.render import render_gaussians
from mojosplat.utils import Camera
# 3D Gaussian data (e.g. from your scene or .splat file)
N = 1000
device = "cuda"
means3d = torch.randn(N, 3, device=device, dtype=torch.float32)
scales = torch.randn(N, 3, device=device, dtype=torch.float32) # log-space
quats = torch.randn(N, 4, device=device, dtype=torch.float32) # (w, x, y, z)
quats = quats / quats.norm(dim=1, keepdim=True)
opacities = torch.randn(N, device=device, dtype=torch.float32) # (N,) not (N, 1)
features = torch.randn(N, 3, device=device, dtype=torch.float32) # RGB
# Camera: R (3,3) world-to-camera, T (3,) world-to-camera, H, W, fx, fy, cx, cy
R = torch.eye(3, device=device, dtype=torch.float32)
T = torch.tensor([0.0, 0.0, 5.0], device=device, dtype=torch.float32)
camera = Camera(R=R, T=T, H=720, W=1280, fx=1152.0, fy=1152.0, cx=640.0, cy=360.0)
# Render (backend: "mojo", "gsplat", or "torch")
image = render_gaussians(means3d, scales, quats, opacities, features, camera, backend="mojo")
# image shape: (H, W, C)# Run all tests
uv run pytest
# Run specific kernel tests
uv run pytest tests/test_projection_mojo.py
uv run pytest tests/test_binning.py
uv run pytest tests/test_rasterization.py
uv run pytest tests/test_render.py
# Run with verbose output
uv run pytest -v# Benchmark with a real .splat scene (e.g. bicycle)
# First download a .splat file (antimatter15 binary format):
curl -L -o examples/bicycle.splat https://huggingface.co/cakewalk/splat-data/resolve/main/bicycle.splat
# Then run the benchmark (defaults to examples/bicycle.splat if present)
uv run python examples/benchmark_render.py examples/bicycle.splatYou can view a .splat scene in the browser with the interactive viewer (drag to orbit, scroll to zoom). Use the same bicycle.splat file as above:
# Ensure you have the scene file (see Benchmarking above for download URL)
uv run python examples/viewer.py examples/bicycle.splatThen open the URL printed in the terminal in your browser. The first render triggers JIT compilation (~30–60 s); subsequent renders are fast.
Benchmark: uv run python examples/benchmark_render.py examples/bicycle.splat (1000 runs full pipeline, 200 runs per kernel).
| Backend | Full pipeline | Projection | Binning | Rasterization |
|---|---|---|---|---|
| gsplat | 2.41 ms (414.9 FPS) | 0.43 ms | 0.46 ms | 1.56 ms |
| mojo | 3.21 ms (311.2 FPS) | 0.91 ms | 0.87 ms | 1.55 ms |
There is still a gap between the performance of the Mojo and GSplat kernels. Some extra work is needed to get closer to the GSplat performance. In some cases the Mojo library does not expose certain instructions that are used in the GSplat kernels, so one can expect to see a performance boost as the language and library mature.
Contributions are very welcome! This is an experimental project exploring the intersection of Mojo and high-performance graphics.
Areas where help is needed:
- PyTorch Rasterization: Native PyTorch rasterization kernel
- Performance Optimization: Analyse current implementation and improve existing Mojo kernels. For example, try to udnersdtand how the generated PTX compares with GSplat and how we can get closer or surpass its performance. Also measure the overhead of the python to mojo connection.
- Backwards pass: implement the mojo kernels for the backwards pass. This will allow the MojoSplat to be used in training the gaussian representation.
- Testing: More comprehensive test coverage
- Unscented Projection: Implmeent the Unscented projection from 3DGUT as an alternative to EWA
To contribute:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
- GSplat for the reference implementation
- 3D Gaussian Splatting for the original method
- Modular for the Mojo language