-
Notifications
You must be signed in to change notification settings - Fork 8k
Optimization for htmlspecialchars function #18126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
75bc970
c6cb392
2a46d43
914fa23
63a9484
79d09c9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -74,6 +74,12 @@ | |||||||||||||
| #define sjis_lead(c) ((c) != 0x80 && (c) != 0xA0 && (c) < 0xFD) | ||||||||||||||
| #define sjis_trail(c) ((c) >= 0x40 && (c) != 0x7F && (c) < 0xFD) | ||||||||||||||
|
|
||||||||||||||
| /* Lookup table for php_htmlspecialchars */ | ||||||||||||||
| typedef struct { | ||||||||||||||
| char* entity[256]; | ||||||||||||||
| ushort entity_len[256]; | ||||||||||||||
| } htmlspecialchars_lut; | ||||||||||||||
|
||||||||||||||
|
|
||||||||||||||
| /* {{{ get_default_charset */ | ||||||||||||||
| static char *get_default_charset(void) { | ||||||||||||||
| if (PG(internal_encoding) && PG(internal_encoding)[0]) { | ||||||||||||||
|
|
@@ -752,6 +758,60 @@ static zend_result resolve_named_entity_html(const char *start, size_t length, c | |||||||||||||
| } | ||||||||||||||
| /* }}} */ | ||||||||||||||
|
|
||||||||||||||
| /* {{{ is_codepoint_allowed */ | ||||||||||||||
| static inline zend_bool is_codepoint_allowed( | ||||||||||||||
|
||||||||||||||
| static inline zend_bool is_codepoint_allowed( | |
| static bool is_codepoint_allowed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const enc_to_uni* to_uni_table /* Mapping table if needed */ | |
| const enc_to_uni *to_uni_table /* Mapping table if needed */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return 1; | |
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| static void init_htmlspecialchars_lut(htmlspecialchars_lut* lut, const int flags, const int doctype) { | |
| static void init_htmlspecialchars_lut(htmlspecialchars_lut *lut, const int flags, const int doctype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using magic numbers, the compiler will be smart enough to precompute this.
| lut->entity_len['&'] = 5; | |
| lut->entity_len['>'] = 4; | |
| lut->entity_len['<'] = 4; | |
| lut->entity_len['&'] = strlen("&"); | |
| lut->entity_len['>'] = strlen(">"); | |
| lut->entity_len['<'] = strlen("<"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a PHPAPI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I’ll go through all the comments and make the necessary changes. I don’t have much experience contributing to public PHP projects yet, so I might not be fully aware of all the conventions and best practices.
I was working off the existing code and reused some parts as-is. I plan to refactor php_html_entities later. The main goal here was to switch from a hashtable to a LUT for special character replacement, and to restructure the logic accordingly.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const char* input_end = input_ptr + input->len; | |
| const char* input_end = input_ptr + ZSTR_LEN(input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const enc_to_uni* to_uni_table = NULL; | |
| const enc_to_uni *to_uni_table = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const unsigned char* replacement = NULL; | |
| const unsigned char *replacement = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk Fixed in 80e7a41
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| replacement_len = sizeof("\xEF\xBF\xBD") - 1; | |
| replacement_len = strlen("\xEF\xBF\xBD"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we need to account for the fact that we’re replacing a single byte.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto prior remarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to get rid of this function completely and basically move it PHP_FUNCTION(htmlentities)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you could alternatively make this switch in php_escape_html_entities_ex by checking all. That might actually make more sense because you can then clean up that function to remove the all checks and also speed up potentially internal usage of that function.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using a pass through function when there is only ever one use case now? Especially as you are adding a C function call overhead, which is weird for an optimization PR.
You should also "inline" the other usage of the php_html_entities pass through function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged in 80e7a41
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,10 +2,10 @@ | |
| Bug #60965: Buffer overflow on htmlspecialchars/entities with $double=false | ||
| --FILE-- | ||
| <?php | ||
| echo htmlspecialchars('"""""""""""""""""""""""""""""""""""""""""""""', | ||
| echo htmlspecialchars('"""""""""""""""""""""""""""""""""""""""""""""�', | ||
|
||
| ENT_QUOTES, 'UTF-8', false), "\n"; | ||
| echo "Done.\n"; | ||
| ?> | ||
| --EXPECT-- | ||
| """"""""""""""""""""""""""""""""""""""""""""" | ||
| """""""""""""""""""""""""""""""""""""""""""""&#x123456789123456789123456789; | ||
| Done. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ushortis not standard.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 80e7a41