You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: posts/counting-words-at-simd-speed.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,7 @@ with open(sys.argv[1], "rb") as f:
43
43
print(words)
44
44
```
45
45
46
-
This program is horrendously slow. It takes 86.6 seconds on my Apple M1 Pro. Python code runs for every byte, incurring interpreter dispatch and object checks again and again.
46
+
This program is horrendously slow. It takes 89.6 seconds on my Apple M1 Pro. Python code runs for every byte, incurring interpreter dispatch and object checks again and again.
47
47
48
48
## Using CPython efficiently (13.7 seconds)
49
49
@@ -70,7 +70,7 @@ This version is ~6× faster than the initial Python version.
70
70
71
71
I think the above Python version is very close to the limit that we can get with straightforward Python (e.g. no NumPy, no threads).
72
72
73
-
By porting our first Python attempt to C, we're rewarded with a ~11× speedup.
73
+
By porting our first Python attempt to C, we're rewarded with a ~74× speedup.
74
74
75
75
```c
76
76
// 2_mvp.c
@@ -100,7 +100,7 @@ Why is it so much quicker? Before, `re.finditer(...)` was creating a Python `Mat
100
100
101
101
The regex engine was also doing extra work when it searched, matched, backtracked, and performed bookkeeping. Even though that's in C, it's still building Python objects for the iterator.
102
102
103
-
In comparison, this version's C loop is a single pass over bytes with two booleans (`prev_ws`, `cur_ws`) and a predictable branch. Compilers turn this into very tight code, i.e., no per-word allocations, and no callbacks into the interpreter.
103
+
In comparison, this version's C loop is a single pass over bytes with two booleans (`prev_ws`, `cur_ws`) and a simple branch. Compilers turn this into very tight code, i.e., no per-word allocations, and no callbacks into the interpreter.
0 commit comments