You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-7Lines changed: 12 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,17 +16,22 @@ Some of the highlights include:
16
16
17
17
-__100x cheaper random inputs?!__ Discover how input generation sometimes costs more than the algorithm.
18
18
-__40x faster trigonometry:__ Speed-up standard library functions like [`std::sin`](https://en.cppreference.com/w/cpp/numeric/math/sin) in just 3 lines of code.
19
-
-__4x faster logic with [`std::ranges`](https://en.cppreference.com/w/cpp/ranges):__ Reduce stack usage and reuse registers more efficiently.
19
+
-__4x faster lazy-logic__ with custom [`std::ranges`](https://en.cppreference.com/w/cpp/ranges)and iterators!
20
20
-__Compiler optimizations beyond `-O3`:__ Learn about less obvious flags and techniques for another 2x speedup.
21
21
-__Multiplying matrices?__ Check how a 3x3x3 GEMM can be 70% slower than 4x4x4, despite 60% fewer ops.
22
+
-__Scaling AI?__ Measure the gap between theoretical [ALU](https://en.wikipedia.org/wiki/Arithmetic_logic_unit) throughput and your [BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms).
22
23
-__How many if conditions are too many?__ Test your CPU's branch predictor with just 10 lines of code.
23
-
-__Prefer recursion to iteration?__ Measure the depth at which your algorithm with `SEGFAULT`.
24
-
-__How not to build state machines:__ Compare `std::variant`, `virtual` functions, and C++20 coroutines.
25
-
-__Scaling to many cores?__ Learn how to use OpenMP, Intel's oneTBB, or your custom thread pool.
26
-
-__How to handle JSON avoiding memory allocations?__ Is it easier with Cor C++ libraries?
27
-
-__How to properly use associative containers__ with custom keys and transparent comparators?
28
-
-__How to beat a hand-written parser__ with `consteval` RegEx engines?
24
+
-__Prefer recursion to iteration?__ Measure the depth at which your algorithm with [`SEGFAULT`](https://en.wikipedia.org/wiki/Segmentation_fault).
25
+
-How to choose between exceptions, `std::error_code`, and [`std::variant`](https://en.cppreference.com/w/cpp/utility/variant)-like wrappers?
26
+
-__Scaling to many cores?__ Learn how to use [OpenMP](https://en.wikipedia.org/wiki/OpenMP), Intel's oneTBB, or your custom thread pool.
27
+
-__How to handle [JSON](https://www.json.org/json-en.html) avoiding memory allocations?__ Is it easier with C++ 20 or old-school C 99 tools?
28
+
-__How to properly use STL's associative containers__ with custom keys and transparent comparators?
29
+
-__How to beat a hand-written parser__ with [`consteval`](https://en.cppreference.com/w/cpp/language/consteval) RegEx engines?
29
30
-__Is the pointer size really 64 bits__ and how to exploit [pointer-tagging](https://en.wikipedia.org/wiki/Tagged_pointer)?
31
+
-__How many packets is [UDP](https://www.cloudflare.com/learning/ddos/glossary/user-datagram-protocol-udp/) dropping__ and how to serve web requests in [`io_uring`](https://en.wikipedia.org/wiki/Io_uring) from user-space?
32
+
-__Scatter and Gather__ for 50% faster vectorized disjoint memory operations.
33
+
-__How to choose between intrinsics, inline Assembly, and separate Assembly files__ for your performance-critical code?
34
+
-__What are Encrypted Enclaves__ and what's the latency of Intel SGX, AMD SEV, and ARM Realm? 🔜
30
35
31
36
To read, jump to the [`less_slow.cpp` source file](https://github.com/ashvardanian/less_slow.cpp/blob/main/less_slow.cpp) and read the code snippets and comments.
32
37
Follow the instructions below to run the code in your environment and compare it to the comments as you read through the source.
0 commit comments