Skip to content

Commit 5f603c7

Browse files
committed
Docs: List project structure
1 parent 726c1e1 commit 5f603c7

File tree

2 files changed

+21
-1
lines changed

2 files changed

+21
-1
lines changed

.vscode/settings.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,9 @@
122122
"unscalable",
123123
"Uring",
124124
"Vardanian",
125+
"Verilog",
125126
"vfmadd",
127+
"VHDL",
126128
"VNNI",
127129
"VPCLMULQDQ",
128130
"Weis",

README.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ Some of the highlights include:
3434
- __CUDA C++, [PTX](https://en.wikipedia.org/wiki/Parallel_Thread_Execution) Intermediate Representations, and SASS__, and how do they differ from CPU code?
3535
- __How to choose between intrinsics, inline `asm`, and separate `.S` files__ for your performance-critical code?
3636
- __Tensor Cores & Memory__ differences on CPUs, and Volta, Ampere, Hopper, and Blackwell GPUs!
37-
- __What are Encrypted Enclaves__ and what's the latency of Intel SGX, AMD SEV, and ARM Realm? 🔜
37+
- __How coding FPGA differs from GPU__ and what is High-Level Synthesis, Verilog, and VHDL? 🔜 #36
38+
- __What are Encrypted Enclaves__ and what's the latency of Intel SGX, AMD SEV, and ARM Realm? 🔜 #31
3839

3940
To read, jump to the [`less_slow.cpp` source file](https://github.com/ashvardanian/less_slow.cpp/blob/main/less_slow.cpp) and read the code snippets and comments.
4041
Follow the instructions below to run the code in your environment and compare it to the comments as you read through the source.
@@ -108,6 +109,23 @@ Alternatively, use the Linux `perf` tool for performance counter collection:
108109
sudo perf stat taskset 0xEFFFEFFFEFFFEFFFEFFFEFFFEFFFEFFF build_release/less_slow --benchmark_enable_random_interleaving=true --benchmark_filter=super_sort
109110
```
110111

112+
## Project Structure
113+
114+
The primary file of this repository is clearly the `less_slow.cpp` C++ file with CPU-side code.
115+
Several other files for different hardware-specific optimizations are created:
116+
117+
```sh
118+
$ tree .
119+
.
120+
├── CMakeLists.txt # Build & assembly instructions for all files
121+
├── less_slow.cpp # Primary CPU-side benchmarking code with the majority of examples
122+
├── less_slow_amd64.S # Hand-written Assembly kernels for 64-bit x86 CPUs
123+
├── less_slow_aarch64.S # Hand-written Assembly kernels for 64-bit Arm CPUs
124+
├── less_slow.cu # CUDA C++ examples for parallel algorithms for Nvidia GPUs
125+
├── less_slow_sm70.ptx # Hand-written PTX IR kernels for Nvidia Volta GPUs
126+
└── less_slow_sm90a.ptx # Hand-written PTX IR kernels for Nvidia Hopper GPUs
127+
```
128+
111129
## Memes and References
112130

113131
Educational content without memes?!

0 commit comments

Comments
 (0)