Let torch determine correct cuda architecture

RomeoV · RomeoV · commit 407f53e25b29 · 2021-08-18T00:27:46.000+02:00
See `pytorch/torch/utils/cpp_extension.cpp:CUDAExtension`:
&gt;   By default the extension will be compiled to run on all archs of the cards visible during the
&gt;   building process of the extension, plus PTX. If down the road a new card is installed the
&gt;   extension may need to be recompiled. If a visible card has a compute capability (CC) that's
&gt;   newer than the newest version for which your nvcc can build fully-compiled binaries, Pytorch
&gt;   will make nvcc fall back to building kernels with the newest version of PTX your nvcc does
&gt;   support (see below for details on PTX).

&gt;   You can override the default behavior using `TORCH_CUDA_ARCH_LIST` to explicitly specify which
&gt;   CCs you want the extension to support:

&gt;   TORCH_CUDA_ARCH_LIST="6.1 8.6" python build_my_extension.py
&gt;   TORCH_CUDA_ARCH_LIST="5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX" python build_my_extension.py

&gt;   The +PTX option causes extension kernel binaries to include PTX instructions for the specified
&gt;   CC. PTX is an intermediate representation that allows kernels to runtime-compile for any CC &gt;=
&gt;   the specified CC (for example, 8.6+PTX generates PTX that can runtime-compile for any GPU with
&gt;   CC &gt;= 8.6). This improves your binary's forward compatibility. However, relying on older PTX to
&gt;   provide forward compat by runtime-compiling for newer CCs can modestly reduce performance on
&gt;   those newer CCs. If you know exact CC(s) of the GPUs you want to target, you're always better
&gt;   off specifying them individually. For example, if you want your extension to run on 8.0 and 8.6,
&gt;   "8.0+PTX" would work functionally because it includes PTX that can runtime-compile for 8.6, but
&gt;   "8.0 8.6" would be better.

&gt;   Note that while it's possible to include all supported archs, the more archs get included the
&gt;   slower the building process will be, as it will build a separate kernel image for each arch.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -9,7 +9,6 @@ if(WITH_CUDA)
   enable_language(CUDA)
   add_definitions(-D__CUDA_NO_HALF_OPERATORS__)
   add_definitions(-DWITH_CUDA)
-  set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -arch=sm_35 --expt-relaxed-constexpr")
 endif()
 
 find_package(Python3 COMPONENTS Development)
diff --git a/setup.py b/setup.py
@@ -62,7 +62,7 @@ def get_extensions():
             define_macros += [('WITH_CUDA', None)]
             nvcc_flags = os.getenv('NVCC_FLAGS', '')
             nvcc_flags = [] if nvcc_flags == '' else nvcc_flags.split(' ')
-            nvcc_flags += ['-arch=sm_35', '--expt-relaxed-constexpr', '-O2']
+            nvcc_flags += ['--expt-relaxed-constexpr', '-O2']
             extra_compile_args['nvcc'] = nvcc_flags
 
             if sys.platform == 'win32':