Commit e670fd7
committed
ext/standard: speed up php_url_parse_ex2 by ~12%
Three related changes to ext/standard/url.c targeting the ctype macros
on the parse_url hot path. On a 17-URL mix (17M parses per run, CPU
pinned, same-session A/B), median wall time drops from 1.90s to 1.68s,
a ~12% reduction and ~13% throughput increase (8.94M/s to 10.10M/s).
1. php_replace_controlchars replaces its iscntrl() call with an inline
`c < 0x20 || c == 0x7f` comparison. Callgrind showed iscntrl at
~14% of total instructions on a realistic URL workload; glibc's
iscntrl goes through __ctype_b_loc() per byte for a TLS lookup and
table deref, which defeats auto-vectorization. URL components are
bytes, not locale-dependent text, so C/POSIX semantics are what we
want regardless of the process locale. The Zend language scanner
uses the same pattern (yych <= 0x1F). This runs once per component
per parse, up to 7 times.
2. The scheme-validation walk uses isalpha/isdigit which have the same
__ctype_b_loc tax. I extracted the check into php_url_is_scheme_char
with an inline ASCII test: ((c | 0x20) - 'a' < 26u) || (c - '0' < 10u)
for the letter/digit half, plus the three literal comparisons for
+ - and . The scheme loop runs once per byte of the scheme on
every parse. A helper php_url_is_ascii_digit covers the two isdigit
call sites in the port-scan loops (one in the mailto-branch port
probe, one in the parse_port fallback).
3. The three branches that allocate ret->scheme all followed
zend_string_init with a php_replace_controlchars call. The scheme
loop above has already rejected any byte that isn't in
[a-zA-Z0-9+.-], so the control-char scan on scheme is dead work.
Removed from all three sites.
No behavior change: the inline comparisons are identical in behavior
to the ctype macros in C/POSIX, and URL bytes are never
locale-dependent. I checked that contaminated inputs like
http://ex\x7fample.com/p\x1fath still get their control bytes replaced
with underscores.1 parent 8ad79e1 commit e670fd7
1 file changed
+16
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
50 | 61 | | |
51 | 62 | | |
52 | 63 | | |
| |||
55 | 66 | | |
56 | 67 | | |
57 | 68 | | |
58 | | - | |
59 | | - | |
| 69 | + | |
| 70 | + | |
60 | 71 | | |
61 | 72 | | |
62 | 73 | | |
| |||
103 | 114 | | |
104 | 115 | | |
105 | 116 | | |
106 | | - | |
| 117 | + | |
107 | 118 | | |
108 | 119 | | |
109 | 120 | | |
| |||
119 | 130 | | |
120 | 131 | | |
121 | 132 | | |
122 | | - | |
123 | 133 | | |
124 | 134 | | |
125 | 135 | | |
| |||
132 | 142 | | |
133 | 143 | | |
134 | 144 | | |
135 | | - | |
| 145 | + | |
136 | 146 | | |
137 | 147 | | |
138 | 148 | | |
| |||
141 | 151 | | |
142 | 152 | | |
143 | 153 | | |
144 | | - | |
145 | 154 | | |
146 | 155 | | |
147 | 156 | | |
148 | 157 | | |
149 | 158 | | |
150 | | - | |
151 | 159 | | |
152 | 160 | | |
153 | 161 | | |
| |||
172 | 180 | | |
173 | 181 | | |
174 | 182 | | |
175 | | - | |
| 183 | + | |
176 | 184 | | |
177 | 185 | | |
178 | 186 | | |
| |||
0 commit comments