Skip to content

Conversation

@ptheywood
Copy link
Member

@ptheywood ptheywood commented Aug 7, 2025

Update the supported CUDA versions from CUDA 11.2-12.x, to 12.0-13.x.

Python Wheels provided for CUDA 12.0+ and 13.0+ on Windows, and CUDA 12.4+ and 13.0+ on Windows.

  • Add CUDA 13 support
    • CCCL 3.x compatibility,
    • Add CUDA 13 to ci CUDA installation scripts, including new subpackages which are required.
    • Build and run the C++ and Python test suites under linux with CUDA 13.0
    • Build and run the C++ and Python test suites under Windows with CUDA 13.0
    • Add CUDA 13 to 'regular' CI workflows
    • Add CUDA 13 to 'thorough' CI workflows
  • Remove CUDA 11 support
    • Update minimum CUDA in CMake (hard removal, or add an option to allow unsupported compilers?)
    • Require CCCL >= 3 (implicit removal of CUDA 11 support)
    • Remove CUDA 11 from 'regular' CI workflows
    • Remove CUDA 11 from 'thorough' CI workflows
  • Require / switch to C++20
    • Update CMake minimum to 3.25.2 (Jan 2023)
    • Update CMake std properties
    • Update readme
    • Update CI host compilers
    • Open issue(s) about taking advantage of c++20 features (std::format etc)
    • Address any new compiler warnings / errors
    • Pyflamegpu on windows now requires CUDA >= 12.4 due to compilation errors in c++20 fixed in 12.4
  • Update readme to clarify that the latest 2 major CUDA versions are supported.
  • Similar changes to the above for FLAMEGPU/FLAMEGPU2-visualiser (Update supported CUDA to 12 + 13 FLAMEGPU2-visualiser#143)
    • Test local vis build(s) before merging
  • Update docs repo, merging at the same time

Depends on #1150

Closes #1292

@ptheywood
Copy link
Member Author

Several MSVC issues to resolve:

  • The windows cuda installation script does not (correclty) error if invalid subpackages are requested
  • CUDA 13 requires additional subpackages for cuda 13, some or all of:
    "crt";
    "nvptxcompiler";
    "nvvm";
    "nsight_vse";
    
  • CUDA 13 on windows seems to hit several compiler errors in internal cuda headers (curand_poisson.h) due to invalid pragmas.
    • We might be able to suppress (some of) the bad pragma issues with a suppression, but should probably report this (I must assume this has already been encountered tbh)
      2025-08-07T16:42:55.2327474Z      1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\curand_poisson.h(642): error #20199-D: unrecognized #pragma in device code [C:\a\FLAMEGPU2\FLAMEGPU2\build\FLAMEGPU\flamegpu.vcxproj]
      2025-08-07T16:42:55.2469229Z                __pragma(warning(push)) __pragma(warning(disable:4996)) __pragma(nv_diagnostic push) __pragma(nv_diag_suppress 1444)
      2025-08-07T16:42:55.2766327Z                         ^
      2025-08-07T16:42:55.2803114Z          
      2025-08-07T16:42:55.3120202Z          Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
      

I'll pick this up once I return from annual leave

@ptheywood ptheywood force-pushed the cuda-13 branch 2 times, most recently from 4a8f7ae to 7cb7825 Compare August 18, 2025 14:15
@ptheywood
Copy link
Member Author

The curand warnings can be suppressed via CMake, and I've narrowed down the subpackges which are required.

CI updates need expanding to other workflows still (only updated the minimal set to avoid significant CI builds for now) and manual c++ and python test suite execution required with CUDA 13 on linux and windows, but will likely hold off until I've resolved #1150.

Readme etc will also need updating still. I'll update the list in the original post.

@ptheywood
Copy link
Member Author

Some new C++20 warnings to address (linux)

/home/ptheywood/code/flamegpu/FLAMEGPU2/src/flamegpu/model/ModelData.cpp: In member function ‘bool flamegpu::ModelData::operator==(const flamegpu::ModelData&) const’:
/home/ptheywood/code/flamegpu/FLAMEGPU2/src/flamegpu/model/ModelData.cpp:105:37: warning: C++20 says that these are ambiguous, even though the second is reversed:
  105 |         && *dependencyGraph == *rhs.dependencyGraph) {
      |                                     ^~~~~~~~~~~~~~~
In file included from /home/ptheywood/code/flamegpu/FLAMEGPU2/src/flamegpu/model/ModelData.cpp:17:
/home/ptheywood/code/flamegpu/FLAMEGPU2/include/flamegpu/model/DependencyGraph.h:40:10: note: candidate 1: ‘bool flamegpu::DependencyGraph::operator==(const flamegpu::DependencyGraph&)’
   40 |     bool operator==(const DependencyGraph& rhs);
      |          ^~~~~~~~
/home/ptheywood/code/flamegpu/FLAMEGPU2/include/flamegpu/model/DependencyGraph.h:40:10: note: candidate 2: ‘bool flamegpu::DependencyGraph::operator==(const flamegpu::DependencyGraph&)’ (reversed)
/home/ptheywood/code/flamegpu/FLAMEGPU2/include/flamegpu/model/DependencyGraph.h:40:10: note: try making the operator a ‘const’ member function

And under msvc others + an error in jitify 2 due to use of std::result_of which was removed from c++20, so I'll have to fix that too. Unsure why this error doesnt' trigger under linux even in c++20 mode.

@ptheywood
Copy link
Member Author

Windows VS2022 + CUDA 13.0 c++ tests passed on my 3060ti. Should probably re-run in the future when this PR is tidied up and merge-able

[==========] 1133 tests from 88 test suites ran. (166386 ms total)        

@ptheywood
Copy link
Member Author

Windows CUDA 13 C++20 Pyflamegpu (swig >= 4.1.0) pytests all pass

676 passed, 12 skipped in 422.62s (0:07:02)

@ptheywood
Copy link
Member Author

ptheywood commented Sep 16, 2025

C++20 is mostly working, just an outstanding issue with pyflamegpu on windows with CUDA 12.0 in c++20 mode.

I need to jump into windows and install CUDA 12.0 to debug this really.

2025-09-04T13:15:32.4774920Z ##[error]     3>C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\xutility(1838): error : variable type "std::_List_const_iterator<std::_List_val<std::_List_simple_types<flamegpu::StepLogFrame>>>" in constexpr function is not a literal type [D:\a\FLAMEGPU2\FLAMEGPU2\build\swig\python\pyflamegpu_swig.vcxproj]
2025-09-04T13:15:32.4782627Z                    detected during:
2025-09-04T13:15:32.4784128Z                      instantiation of class "std::reverse_iterator<_BidIt> [with _BidIt=std::_List_const_iterator<std::_List_val<std::_List_simple_types<flamegpu::StepLogFrame>>>]" 
2025-09-04T13:15:32.4784744Z          (857): here
2025-09-04T13:15:32.4785587Z                      instantiation of "decltype(auto) std::ranges::_Iter_move::_Cpo::operator()(_Ty &&) const [with _Ty=std::reverse_iterator<std::_List_const_iterator<std::_List_val<std::_List_simple_types<flamegpu::StepLogFrame>>>> &]" 
2025-09-04T13:15:32.4786296Z          (1313): here
2025-09-04T13:15:32.4787099Z                      instantiation of "const __nv_bool std::_Is_ranges_random_iter_v [with _Iter=std::reverse_iterator<std::_List_const_iterator<std::_List_val<std::_List_simple_types<flamegpu::StepLogFrame>>>>]" 
2025-09-04T13:15:32.4787728Z          (1633): here
2025-09-04T13:15:32.4788577Z                      instantiation of "void std::advance(_InIt &, _Diff) [with _InIt=std::reverse_iterator<std::_List_const_iterator<std::_List_val<std::_List_simple_types<flamegpu::StepLogFrame>>>>, _Diff=unsigned long long]" 
2025-09-04T13:15:32.4789477Z          D:\a\FLAMEGPU2\FLAMEGPU2\build\swig\python\pyflamegpu\flamegpuPYTHON_wrap.cxx(4801): here
2025-09-04T13:15:32.4790567Z                      instantiation of "Sequence *swig::getslice(const Sequence *, Difference, Difference, Py_ssize_t) [with Sequence=std::list<flamegpu::StepLogFrame, std::allocator<flamegpu::StepLogFrame>>, Difference=ptrdiff_t]" 
2025-09-04T13:15:32.4791530Z          D:\a\FLAMEGPU2\FLAMEGPU2\build\swig\python\pyflamegpu\flamegpuPYTHON_wrap.cxx(8307): here
2025-09-04T13:15:32.4791880Z          
2025-09-04T13:15:32.4930147Z ##[error]     3>C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\xutility(1840): error : no instance of function template "std::ranges::_Iter_move::_Cpo::operator()" matches the argument list [D:\a\FLAMEGPU2\FLAMEGPU2\build\swig\python\pyflamegpu_swig.vcxproj]
2025-09-04T13:15:32.4932630Z                      argument types are: (<error-type>)
2025-09-04T13:15:32.4933307Z                      object type is: const std::ranges::_Iter_move::_Cpo
2025-09-04T13:15:32.4933868Z                    detected during:
2025-09-04T13:15:32.4935177Z                      instantiation of class "std::reverse_iterator<_BidIt> [with _BidIt=std::_List_const_iterator<std::_List_val<std::_List_simple_types<flamegpu::StepLogFrame>>>]" 
2025-09-04T13:15:32.4936167Z          (857): here
2025-09-04T13:15:32.4937276Z                      instantiation of "decltype(auto) std::ranges::_Iter_move::_Cpo::operator()(_Ty &&) const [with _Ty=std::reverse_iterator<std::_List_const_iterator<std::_List_val<std::_List_simple_types<flamegpu::StepLogFrame>>>> &]" 
2025-09-04T13:15:32.4937974Z          (1313): here
2025-09-04T13:15:32.4938729Z                      instantiation of "const __nv_bool std::_Is_ranges_random_iter_v [with _Iter=std::reverse_iterator<std::_List_const_iterator<std::_List_val<std::_List_simple_types<flamegpu::StepLogFrame>>>>]" 
2025-09-04T13:15:32.4939598Z          (1633): here
2025-09-04T13:15:32.4940415Z                      instantiation of "void std::advance(_InIt &, _Diff) [with _InIt=std::reverse_iterator<std::_List_const_iterator<std::_List_val<std::_List_simple_types<flamegpu::StepLogFrame>>>>, _Diff=unsigned long long]" 
2025-09-04T13:15:32.4941325Z          D:\a\FLAMEGPU2\FLAMEGPU2\build\swig\python\pyflamegpu\flamegpuPYTHON_wrap.cxx(4801): here
2025-09-04T13:15:32.4942381Z                      instantiation of "Sequence *swig::getslice(const Sequence *, Difference, Difference, Py_ssize_t) [with Sequence=std::list<flamegpu::StepLogFrame, std::allocator<flamegpu::StepLogFrame>>, Difference=ptrdiff_t]" 
2025-09-04T13:15:32.4943324Z          D:\a\FLAMEGPU2\FLAMEGPU2\build\swig\python\pyflamegpu\flamegpuPYTHON_wrap.cxx(8307): here
2025-09-04T13:15:32.4943676Z          

Reproduced locally with CUDA 12.0 on windows.

Unsurpsingly (And unfortunately) bumping swig to 4.2.1 or 4.3.0 does not resolve this issue (seeing as it's fine in newer CUDA releases).

The issue occurs within msvc's reverse_iterator in xutility, on the return statement of iter_move.

#if _HAS_CXX20
    _NODISCARD friend constexpr iter_rvalue_reference_t<_BidIt> iter_move(const reverse_iterator& _It)
        noexcept(is_nothrow_copy_constructible_v<_BidIt> && noexcept(_RANGES iter_move(--_STD declval<_BidIt&>()))) {
        auto _Tmp = _It.current;
        --_Tmp;
        return _RANGES iter_move(_Tmp);
    }

When using Swig 4.3.0, this is hit during the std::distance call within SwigPyIterator::distance, called through a chain of methods/objects ultimately from wrapping RunLogMap (although this error would likely occur for other similar invocations, this is just the first one).

template<typename OutIterator>
  class SwigPyIterator_T :  public SwigPyIterator

...
    ptrdiff_t distance(const SwigPyIterator &iter) const
    {
      const self_type *iters = dynamic_cast<const self_type *>(&iter);
      if (iters) {
	return std::distance(current, iters->get_current());
      } else {
	throw std::invalid_argument("bad iterator type");
      }
    }    
...
SWIGINTERN PyObject *_wrap_RunLogMap_rbegin(PyObject *self, PyObject *args) {

...
resultobj = SWIG_NewPointerObj(swig::make_output_iterator(static_cast< const std::map< unsigned int,flamegpu::RunLog >::reverse_iterator & >(result)),
    swig::SwigPyIterator::descriptor(),SWIG_POINTER_OWN);
  return resultobj;
...
}

As using a newer CUDA compiler does not encouter this issue, it's most likely a compiler / stdlib mismatch issue that swig is just exposign to us, we could:

  • Not switch to c++20 😢
  • Bump the minimum CUDA version on windows to 12.x for pyflamegpu support (also not ideal, but compiler bugs are compiler bugs).
    • CUDA 13.0 on CI is OK
    • CUDA 12.9 on CI is OK
    • CUAD 12.4 locally is OK
    • CUDA 12.3 locally is unhappy.
  • Try some (probably horrible) workarounds (i.e. build the swig wrapped version in c++17 on windows with older CUDA versions, which for now atleast should probably work as we don't return any c++20 objects and the same compiler should be getting used for abi compat.)

Manually forcing -std=c++17 for pyflamegpu on windows with CUDA 12.0 when flamegpu was built with -std=c++20 does currently build, link and produce a functional pyflamegpu.

However this is pretty dirty, would mean we can't use any c++20 features in our include/ header files (without guarding it out so that swig doesn't see it), and apparently it's non trivial to get CMake to set the flag correclty for that target with how c++20 is being set elsewhere...

@ptheywood
Copy link
Member Author

Due to the above MSVC + CUDA < 12.4 + -std=c++20 std::ranges issues causing the above (and CI failures) we have 2 real options:

  • Don't upgrade to c++20 (:disappointed: but not the end of the world I suppose)
  • Upgrade to c++20, but bump our minimum pyflamegpu on Windows support to CUDA >= 12.4
    • C++ with CUDA 12.0 would still be supported, just not pyflamegpu, which would mean 12.4 would be our windows wheels too. Windows users should keep their drivers more up to date than (hpc) linux
    • 12.4 is also officially supported by recent MSVC, so this is probably the correct thing to do anyway even though it changes our wheel support once again, and complicates it w.r.t linux.

I'm becoming more and more inclined to just bump our windows pyflamegpu builds to 12.4 + 13.0 (or even 13.0 only) as so much time has been spent recently fighting Windows CI that maintaining support for older versions is becoming more and more of a time sink, and will probably just break at some random point in the future for no good reason.

@ptheywood ptheywood force-pushed the cuda-13 branch 2 times, most recently from 9e03b6c to b17512b Compare September 16, 2025 14:08
@ptheywood ptheywood mentioned this pull request Sep 16, 2025
5 tasks
@ptheywood
Copy link
Member Author

For now, I've bumped windows wheel support to CUDA >= 12.4, and updated CI accordingly including an extra windows job checking non-python CUDA 12.0 support on windows.

@ptheywood
Copy link
Member Author

I belive this is now ready, subject to a tweak and rebase after the visualisation PR is merged and is still blocked by #1150.

Unfortunately this does complicate our python wheel / support story which I had been trying to avoid.

I'm open to us only having a single CUDA version supported for python binary wheels on Windows which would slightly simplify things? (colab still requires CUDA 12 for linux (12.4 drivers, but on tesla hardware with 12.5 installed currently)

I've manually tested c++ and python test suites on both linux and windows, and checked visualiastion still runs on linux (C++ and python).

Base automatically changed from jitify2 to master October 1, 2025 11:57
…not quiet.

Caches the most recently found version to emit a warning in case the minimum version has been increased.
…ssions on Windows

CMake uses the nvcc version, which is the same for 13.0 and 13.0 update 1, so the suppression must be applied for all 13.0 releases
@ptheywood ptheywood merged commit e902356 into master Oct 1, 2025
57 checks passed
@ptheywood ptheywood deleted the cuda-13 branch October 1, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace CUDA 11 support with CUDA 13

2 participants