Commit 24463fe
committed
ext/standard: speed up php_url_parse_ex2 by ~12%
Three related changes to ext/standard/url.c targeting the ctype macros
on the parse_url hot path. On a 17-URL mix (17M parses per run, CPU
pinned, same-session A/B), median wall time drops from 1.90s to 1.68s,
a ~12% reduction and ~13% throughput increase (8.94M/s to 10.10M/s).
1. php_replace_controlchars replaces its iscntrl() call with an inline
`c < 0x20 || c == 0x7f` comparison. Callgrind showed iscntrl at
~14% of total instructions on a realistic URL workload; glibc's
iscntrl goes through __ctype_b_loc() per byte for a TLS lookup and
table deref, which defeats auto-vectorization. URL components are
bytes, not locale-dependent text, so C/POSIX semantics are what we
want regardless of the process locale. The Zend language scanner
uses the same pattern (yych <= 0x1F). This runs once per component
per parse, up to 7 times.
2. The scheme-validation walk uses isalpha/isdigit which have the same
__ctype_b_loc tax. I extracted the check into php_url_is_scheme_char
with an inline ASCII test: ((c | 0x20) - 'a' < 26u) || (c - '0' < 10u)
for the letter/digit half, plus the three literal comparisons for
+ - and . The scheme loop runs once per byte of the scheme on
every parse. A helper php_url_is_ascii_digit covers the two isdigit
call sites in the port-scan loops (one in the mailto-branch port
probe, one in the parse_port fallback).
3. The three branches that allocate ret->scheme all followed
zend_string_init with a php_replace_controlchars call. The scheme
loop above has already rejected any byte that isn't in
[a-zA-Z0-9+.-], so the control-char scan on scheme is dead work.
Removed from all three sites.
No behavior change: the inline comparisons are identical in behavior
to the ctype macros in C/POSIX, and URL bytes are never
locale-dependent. I checked that contaminated inputs like
http://ex\x7fample.com/p\x1fath still get their control bytes replaced
with underscores.1 parent 8ad79e1 commit 24463fe
1 file changed
+29
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
50 | 64 | | |
51 | 65 | | |
52 | 66 | | |
53 | 67 | | |
54 | 68 | | |
55 | 69 | | |
56 | 70 | | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
57 | 76 | | |
58 | | - | |
59 | | - | |
| 77 | + | |
| 78 | + | |
60 | 79 | | |
61 | 80 | | |
62 | 81 | | |
| |||
103 | 122 | | |
104 | 123 | | |
105 | 124 | | |
106 | | - | |
| 125 | + | |
107 | 126 | | |
108 | 127 | | |
109 | 128 | | |
| |||
118 | 137 | | |
119 | 138 | | |
120 | 139 | | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
121 | 143 | | |
122 | | - | |
123 | 144 | | |
124 | 145 | | |
125 | 146 | | |
| |||
132 | 153 | | |
133 | 154 | | |
134 | 155 | | |
135 | | - | |
| 156 | + | |
136 | 157 | | |
137 | 158 | | |
138 | 159 | | |
139 | 160 | | |
140 | 161 | | |
141 | 162 | | |
142 | 163 | | |
| 164 | + | |
143 | 165 | | |
144 | | - | |
145 | 166 | | |
146 | 167 | | |
147 | 168 | | |
148 | 169 | | |
| 170 | + | |
149 | 171 | | |
150 | | - | |
151 | 172 | | |
152 | 173 | | |
153 | 174 | | |
| |||
172 | 193 | | |
173 | 194 | | |
174 | 195 | | |
175 | | - | |
| 196 | + | |
176 | 197 | | |
177 | 198 | | |
178 | 199 | | |
| |||
0 commit comments