You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
_Kernel Float_ is a header-only library for CUDA that simplifies working with vector and reduced precision types in GPU code.
12
+
_Kernel Float_ is a header-only library for CUDA that simplifies working with vector types and reduced precision floating-point arithmetic in GPU code.
13
13
14
-
CUDA offers several reduced precision floating-point types (`__half`, `__nv_bfloat16`, `__nv_fp8_e4m3`, `__nv_fp8_e5m2`)
14
+
15
+
## Summary
16
+
17
+
CUDA natively offers several reduced precision floating-point types (`__half`, `__nv_bfloat16`, `__nv_fp8_e4m3`, `__nv_fp8_e5m2`)
15
18
and vector types (e.g., `__half2`, `__nv_fp8x4_e4m3`, `float3`).
type conversion is awkward (e.g., `__nv_cvt_halfraw2_to_fp8x2` converts float16 to float8),
19
22
and some functionality is missing (e.g., one cannot convert a `__half` to `__nv_bfloat16`).
20
23
@@ -24,6 +27,8 @@ Internally, the data is stored using the most optimal type available, for exampl
24
27
Operator overloading (like `+`, `*`, `&&`) has been implemented such that the most optimal intrinsic for the available types is selected automatically.
25
28
Many mathetical functions (like `log`, `exp`, `sin`) and common operations (such as `sum`, `range`, `for_each`) are also available.
26
29
30
+
By using this library, developers can avoid the complexity of working with reduced precision floating-point types in CUDA and focus on their applications.
31
+
27
32
28
33
## Features
29
34
@@ -33,7 +38,7 @@ In a nutshell, _Kernel Float_ offers the following features:
33
38
* Operator overloading to simplify programming.
34
39
* Support for half (16 bit) and quarter (8 bit) floating-point precision.
35
40
* Easy integration as a single header file.
36
-
*Compatible with C++17.
41
+
*Written for C++17.
37
42
* Compatible with NVCC (NVIDIA Compiler) and NVRTC (NVIDIA Runtime Compilation).
0 commit comments