Skip to content

Unexpectedly slower performance on RTX 4060 vs README's GTX 1050 Ti numbers #82

@Zyl225

Description

@Zyl225

Hi, thanks for open-sourcing this great CUDA voxelizer.

I tried to reproduce the performance numbers from the README and noticed that my results on a newer GPU are significantly slower than what is reported for a GTX 1050 Ti.

Environment

  • GPU: NVIDIA GeForce RTX 4060
  • CUDA: 12.1
  • OS: Windows 10
  • Driver version: [e.g. 555.xx]
  • Build: [prebuilt binary / built from source with CMake, Release configuration]

What I did

I voxelized a 256³ grid as in the README.
On my setup, the measured time is about 4.7 ms for resolution 256, while the README mentions about 0.6 ms on a GTX 1050 Ti for the same resolution (excluding file I/O).

Because my card should be significantly faster than a 1050 Ti, I’m wondering if I am misunderstanding the benchmark conditions or missing some important build/runtime settings.

Questions

  1. Are the numbers in the README pure kernel time (excluding file I/O and host↔device transfers), or something else?
  2. Do you have recommended CMake / nvcc flags or CUDAARCHS settings for newer GPUs like RTX 40-series (e.g. sm_89)?
  3. Are there any known issues or performance pitfalls when running this code with CUDA 12.x or on Ada / RTX 40-series GPUs?
  4. Is there anything in the sample configuration (e.g. solid vs non-solid mode, specific test mesh) that I should be careful to match exactly?

If it helps, I can share more detailed logs, profiler output, or a minimal repro of how I am timing the kernel.

Thanks in advance for any hints!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions