[SYCL][DOC] Update CUDA docs with Windows support details (#4796)

AidanBeltonS · web-flow · commit 1cf024ac32a3 · 2021-10-22T20:25:08.000+03:00
This patch updates the compiler and runtime docs and getting started docs with details of CUDA's support for Windows.

It explains the motivation and usage of re-mangled `libspirv-nvptx64--nvidiacl.bc` variants. 
Links to url for installing CUDA on Windows OS.
Clearly states that there should be no dependencies on Linux system.
diff --git a/sycl/doc/CompilerAndRuntimeDesign.md b/sycl/doc/CompilerAndRuntimeDesign.md
@@ -538,8 +538,12 @@ passed to `-fsycl-targets`.
 Unlike other AOT targets, the bitcode module linked from intermediate compiled
 objects never goes through SPIR-V. Instead it is passed directly in bitcode form
 down to the NVPTX Back End. All produced bitcode depends on two libraries,
-`libdevice.bc` (provided by the CUDA SDK) and `libspirv-nvptx64--nvidiacl.bc`
-(built by the libclc project).
+`libdevice.bc` (provided by the CUDA SDK) and `libspirv-nvptx64--nvidiacl.bc` variants
+(built by the libclc project). `libspirv-nvptx64--nvidiacl.bc` is not used directly. 
+Instead it is used to generate remangled variants 
+`remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc` and
+`remangled-l32-signed_char.libspirv-nvptx64--nvidiacl.bc` to handle primitive type
+differences between Linux and Windows.
 
 ##### Device code post-link step for CUDA
 
@@ -568,6 +572,19 @@ path in SYCL kernels.
 
 ##### NVPTX Builtins
 
+Builtins are implemented in OpenCL C within libclc. OpenCL C treats `long` 
+types as 64 bit and has no `long long` types while Windows DPC++ treats `long`
+types like 32-bit integers and `long long` types like 64-bit integers. 
+Differences between the primitive types can cause applications to use 
+incompatible libclc built-ins. A remangler creates multiple libspriv files 
+with different remangled function names to support both Windows and Linux. 
+When building a SYCL application targeting the CUDA backend the driver 
+will link the device code with 
+`remangled-l32-signed_char.libspirv-nvptx64--nvidiacl.bc` if the host target is
+Windows or it will link the device code with
+`remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc` if the host target is
+Linux.
+
 When the SYCL compiler is in device mode and targeting the NVPTX backend, the
 compiler exposes NVPTX builtins supported by clang.
 
diff --git a/sycl/doc/GetStartedGuide.md b/sycl/doc/GetStartedGuide.md
@@ -148,18 +148,24 @@ python %DPCPP_HOME%\llvm\buildbot\compile.py
 
 There is experimental support for DPC++ for CUDA devices.
 
-To enable support for CUDA devices, follow the instructions for the Linux
-DPC++ toolchain, but add the `--cuda` flag to `configure.py`
+To enable support for CUDA devices, follow the instructions for the Linux or
+Windows DPC++ toolchain, but add the `--cuda` flag to `configure.py`. Note, 
+the CUDA backend has experimental Windows support, windows subsystem for 
+linux (WSL) is not needed to build and run the CUDA backend.
 
 Enabling this flag requires an installation of
 [CUDA 10.2](https://developer.nvidia.com/cuda-10.2-download-archive) on
 the system, refer to
-[NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html).
+[NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
+or
+[NVIDIA CUDA Installation Guide for Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)
 
 Currently, the only combination tested is Ubuntu 18.04 with CUDA 10.2 using
-a Titan RTX GPU (SM 71), but it should work on any GPU compatible with SM 50 or
-above. The default SM for the NVIDIA CUDA backend is 5.0. Users can specify
-lower values, but some features may not be supported.
+a Titan RTX GPU (SM 71). The CUDA backend should work on Windows or Linux 
+operating systems with any GPU compatible with SM 50 or above. The default 
+SM for the NVIDIA CUDA backend is 5.0. Users can specify lower values, 
+but some features may not be supported. Windows CUDA support is experimental
+as it is not currently tested on the CI.
 
 **Non-standard CUDA location**