CuLab, G2CPU, and the Graiphic Accelerator
Welcome to the Graiphic Benchmarking Whitepaper Repository, where we share the methods, results, and LabVIEW sources used to compare the main GPU acceleration toolkits for LabVIEW.
This repository accompanies the official whitepaper:
👉 Benchmarking the Future: Comparing LabVIEW GPU Toolkits CuLab, G2CPU, and the Graiphic Accelerator (v1.1)
This benchmark measures and compares the performance, integration, and determinism of several LabVIEW GPU toolkits — all tested in the same LabVIEW environment.
- Graiphic Accelerator Toolkit
- CuLab GPU Toolkit 4.1.2.80 (Ngene)
- G2CPU GPU and CPU HPC Toolkit 1.6.0.15 (Natan Biesmans)
- Native LabVIEW CPU execution
The objective is to provide a real-world comparison and understand the trade-offs between speed, scalability, and ease of integration.
| Component | Specification |
|---|---|
| OS | Windows 11 |
| CPU | Intel® Core™ i9-10850K @ 3.60 GHz |
| GPU | NVIDIA GeForce RTX 3060 |
| LabVIEW | 2025 Q3 |
| CUDA | 12.8 |
| TensorRT | 10.13.3.9 |
| DirectML | 1.15.4.0 |
| Date | November 6, 2025 |
This setup represents a balanced workstation configuration for reproducible LabVIEW GPU benchmarks.
- GEMM Processing
Matrix multiplication followed by arithmetic post-processing. - Arithmetic Operations
Iterative Add / Neg / Mul / Div loops for element-wise operations. - Complex Number Computation
Handling of real + imaginary tensors using ONNX custom nodes. - Signal Processing Application
FFT + arithmetic operations on real NI-like signal data (~32 k samples).
➤ This test was designed to reflect realistic, small-scale sensor workloads — not synthetic stress tests.
-
Graiphic Accelerator (TensorRT) achieves the highest performance, up to:
- 5× faster than CuLab
- 40× faster than G2CPU
-
Compiled-graph execution (ONNX Runtime) drastically reduces overhead compared to per-node DLL execution.
-
Complex-number support works using custom ONNX nodes — an area for future standardization.
-
For small data blocks, CPU execution remains competitive; GPU benefits increase with workload size.
All LabVIEW VIs used to generate the benchmark results are available in the
/Source directory.
| Benchmark | Folder | Description |
|---|---|---|
| GEMM | Source/GEMM | Matrix-multiplication tests |
| Arithmetic | Source/Not Complex | Scalar & vector operations |
| Complex | Source/Complex | Custom complex-number computation |
| Signal Processing | Source/Signal Processing Without Indicator And Warmup | FFT-based signal test |
Additional required file:
TEMP.BIN(2 GB, test data)
👉 http://download2.graiphic.io/_Bench/TEMP.BIN
This benchmark was built for transparency, reproducibility, and collaboration.
Community contributions encouraged:
- Independent replication
- Comparative pull requests
- New test proposals
- Methodology discussions
Discussion board:
https://github.com/Graiphic/whitepapers/issues
Repository:
https://github.com/Graiphic/whitepapers
Graiphic develops the first ecosystem unifying AI + Logic + Hardware + Energy inside a single ONNX graph.
| Version | Date | Author | Description |
|---|---|---|---|
| 1.0 | 2025-10-15 | Youssef Menjour | First release |
| 1.1 | 2025-11-07 | Youssef Menjour | Added DirectML EP |
Following this benchmark, we launched
👉 LabVIEW Open Benchmark Suite (LOBS)
LOBS provides:
- Open-source vendor-neutral tests
- Reproducible pipelines
- Transparent comparison criteria
This whitepaper is Reference 0 of the suite.
