Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions guide/book.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
authors = ["Riccardo D'Ambrosio<[email protected]>"]
language = "en"
src = "src"
title = "GPU Computing with Rust using CUDA"
description = "Writing extremely fast GPU Computing code with rust using rustc_codegen_nvvm and CUDA"
title = "The Rust CUDA Guide"
description = "How to write GPU compute code with Rust using rustc_codegen_nvvm and CUDA"
2 changes: 1 addition & 1 deletion guide/src/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Introduction

Welcome to the rust-cuda guide! Let's dive right in.
Welcome to the Rust CUDA guide! Let's dive right in.
6 changes: 3 additions & 3 deletions guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@
- [The CUDA Toolkit](cuda/README.md)
- [GPU Computing](cuda/gpu_computing.md)
- [The CUDA Pipeline](cuda/pipeline.md)
- [rustc_codegen_nvvm](nvvm/README.md)
- [Custom Rustc Backends](nvvm/backends.md)
- [rustc_codegen_nvvm](nvvm/nvvm.md)
- [`rustc_codegen_nvvm`](nvvm/README.md)
- [Custom rustc Backends](nvvm/backends.md)
- [`rustc_codegen_nvvm`](nvvm/nvvm.md)
- [Types](nvvm/types.md)
- [PTX Generation](nvvm/ptxgen.md)
- [Debugging](nvvm/debugging.md)
2 changes: 1 addition & 1 deletion guide/src/cuda/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The CUDA Toolkit is an ecosystem for executing extremely fast code on NVIDIA GPUs for the purpose of general computing.

CUDA includes many libraries for this purpose, including the Driver API, Runtime API, the PTX ISA, libnvvm, etc. CUDA
CUDA includes many libraries for this purpose, including the Driver API, Runtime API, the PTX ISA, libNVVM, etc. CUDA
is currently the best option for computing in terms of libraries and control available, however, it unfortunately only works
on NVIDIA GPUs.

Expand Down
28 changes: 14 additions & 14 deletions guide/src/cuda/gpu_computing.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# GPU Computing
# GPU computing

You probably already know what GPU computing is, but if you don't, it is utilizing the extremely parallel
nature of GPUs for purposes other than rendering. It is widely used in many scientific and consumer fields.
Expand All @@ -13,41 +13,41 @@ of time and/or take different code paths.

CUDA is currently one of the best choices for fast GPU computing for multiple reasons:
- It offers deep control over how kernels are dispatched and how memory is managed.
- It has a rich ecosystem of tutorials, guides, and libraries such as cuRand, cuBlas, libnvvm, optix, the PTX ISA, etc.
- It has a rich ecosystem of tutorials, guides, and libraries such as cuRAND, cuBLAS, libNVVM, OptiX, the PTX ISA, etc.
- It is mostly unmatched in performance because it is solely meant for computing and offers rich control.
And more...

However, CUDA can only run on NVIDIA GPUs, which precludes AMD gpus from tools that use it. However, this is a drawback that
is acceptable by many because of the significant developer cost of supporting both NVIDIA gpus with CUDA and
AMD gpus with OpenCL, since OpenCL is generally slower, clunkier, and lacks libraries and docs on par with CUDA.
However, CUDA can only run on NVIDIA GPUs, which precludes AMD GPUs from tools that use it. However, this is a drawback that
is acceptable by many because of the significant developer cost of supporting both NVIDIA GPUs with CUDA and
AMD GPUs with OpenCL, since OpenCL is generally slower, clunkier, and lacks libraries and docs on par with CUDA.

# Why Rust?

Rust is a great choice for GPU programming, however, it has needed a kickstart, which is what rustc_codegen_nvvm tries to
Rust is a great choice for GPU programming, however, it has needed a kickstart, which is what `rustc_codegen_nvvm` tries to
accomplish; The initial hurdle of getting Rust to compile to something CUDA can run is over, now comes the design and
polish part.

On top of its rich language features (macros, enums, traits, proc macros, great errors, etc), Rust's safety guarantees
can be applied in gpu programming too; A field that has historically been full of implied invariants and unsafety, such
can be applied in GPU programming too; A field that has historically been full of implied invariants and unsafety, such
as (but not limited to):
- Expecting some amount of dynamic shared memory from the caller.
- Expecting a certain layout for thread blocks/threads.
- Manually handling the indexing of data, leaving code prone to data races if not managed correctly.
- Forgetting to free memory, using uninitialized memory, etc.

Not to mention the standardized tooling that makes the building, documentation, sharing, and linting of gpu kernel libraries easily possible.
Most of the reasons for using rust on the CPU apply to using Rust for the GPU, these reasons have been stated countless times so
i will not repeat them here.
Not to mention the standardized tooling that makes the building, documentation, sharing, and linting of GPU kernel libraries easily possible.
Most of the reasons for using Rust on the CPU apply to using Rust for the GPU, these reasons have been stated countless times so
I will not repeat them here.

A couple of particular rust features make writing CUDA code much easier: RAII and Results.
A couple of particular Rust features make writing CUDA code much easier: RAII and Results.
In `cust` everything uses RAII (through `Drop` impls) to manage freeing memory and returning handles, which
frees users from having to think about that, which yields safer, more reliable code.

Results are particularly helpful, almost every single call in every CUDA library returns a status code in the form of a cuda result.
Results are particularly helpful, almost every single call in every CUDA library returns a status code in the form of a CUDA result.
Ignoring these statuses is very dangerous and can often lead to random segfaults and overall unreliable code. For this purpose,
both the CUDA SDK, and other libraries provide macros to handle such statuses. This handling is not very reliable and causes
dependency issues down the line.

Instead of an unreliable system of macros, we can leverage rust results for this. In cust we return special `CudaResult<T>`
results that can be bubbled up using rust's `?` operator, or, similar to `CUDA_SAFE_CALL` can be unwrapped or expected if
Instead of an unreliable system of macros, we can leverage Rust results for this. In cust we return special `CudaResult<T>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Instead of an unreliable system of macros, we can leverage Rust results for this. In cust we return special `CudaResult<T>`
Instead of an unreliable system of macros, we can leverage Rust results for this. In `cust` we return special `CudaResult<T>`

results that can be bubbled up using Rust's `?` operator, or, similar to `CUDA_SAFE_CALL` can be unwrapped or expected if
proper error handling is not needed.
6 changes: 3 additions & 3 deletions guide/src/cuda/pipeline.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# The CUDA Pipeline
# The CUDA pipeline

CUDA is traditionally used via CUDA C/C++ files which have a `.cu` extension. These files can be
compiled using NVCC (NVIDIA CUDA Compiler) into an executable.
Expand All @@ -19,13 +19,13 @@ with additional restrictions including the following.
- Some linkage types are not supported.
- Function ABIs are ignored; everything uses the PTX calling convention.

libnvvm is a closed source library which takes NVVM IR, optimizes it further, then converts it to
libNVVM is a closed source library which takes NVVM IR, optimizes it further, then converts it to
PTX. PTX is a low level, assembly-like format with an open specification which can be targeted by
any language. For an assembly format, PTX is fairly user-friendly.
- It is well formatted.
- It is mostly fully specified (other than the iffy grammar specification).
- It uses named registers/parameters.
- It uses virtual registers. (Because gpus have thousands of registers, listing all of them out
- It uses virtual registers. (Because GPUs have thousands of registers, listing all of them out
would be unrealistic.)
- It uses ASCII as a file encoding.

Expand Down
Loading
Loading