You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-1Lines changed: 19 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,8 @@ Some of the highlights include:
34
34
-__CUDA C++, [PTX](https://en.wikipedia.org/wiki/Parallel_Thread_Execution) Intermediate Representations, and SASS__, and how do they differ from CPU code?
35
35
-__How to choose between intrinsics, inline `asm`, and separate `.S` files__ for your performance-critical code?
36
36
-__Tensor Cores & Memory__ differences on CPUs, and Volta, Ampere, Hopper, and Blackwell GPUs!
37
-
-__What are Encrypted Enclaves__ and what's the latency of Intel SGX, AMD SEV, and ARM Realm? 🔜
37
+
-__How coding FPGA differs from GPU__ and what is High-Level Synthesis, Verilog, and VHDL? 🔜 #36
38
+
-__What are Encrypted Enclaves__ and what's the latency of Intel SGX, AMD SEV, and ARM Realm? 🔜 #31
38
39
39
40
To read, jump to the [`less_slow.cpp` source file](https://github.com/ashvardanian/less_slow.cpp/blob/main/less_slow.cpp) and read the code snippets and comments.
40
41
Follow the instructions below to run the code in your environment and compare it to the comments as you read through the source.
@@ -108,6 +109,23 @@ Alternatively, use the Linux `perf` tool for performance counter collection:
108
109
sudo perf stat taskset 0xEFFFEFFFEFFFEFFFEFFFEFFFEFFFEFFF build_release/less_slow --benchmark_enable_random_interleaving=true --benchmark_filter=super_sort
109
110
```
110
111
112
+
## Project Structure
113
+
114
+
The primary file of this repository is clearly the `less_slow.cpp` C++ file with CPU-side code.
115
+
Several other files for different hardware-specific optimizations are created:
116
+
117
+
```sh
118
+
$ tree .
119
+
.
120
+
├── CMakeLists.txt # Build & assembly instructions for all files
121
+
├── less_slow.cpp # Primary CPU-side benchmarking code with the majority of examples
122
+
├── less_slow_amd64.S # Hand-written Assembly kernels for 64-bit x86 CPUs
123
+
├── less_slow_aarch64.S # Hand-written Assembly kernels for 64-bit Arm CPUs
124
+
├── less_slow.cu # CUDA C++ examples for parallel algorithms for Nvidia GPUs
125
+
├── less_slow_sm70.ptx # Hand-written PTX IR kernels for Nvidia Volta GPUs
126
+
└── less_slow_sm90a.ptx # Hand-written PTX IR kernels for Nvidia Hopper GPUs
0 commit comments