-
Notifications
You must be signed in to change notification settings - Fork 357
parser.c: various optimizations and simplifications #888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -84,17 +84,51 @@ static void rvalue_cache_insert_at(rvalue_cache *cache, int index, VALUE rstring | |
| cache->entries[index] = rstring; | ||
| } | ||
|
|
||
| static inline int rstring_cache_cmp(const char *str, const long length, VALUE rstring) | ||
| #if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) && defined(__has_builtin) && __has_builtin(__builtin_bswap64) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also applies to previous SWAR optimizations: We gated these optimizations to Little Endian, which is fine, but they also assume 64bit arch, perhaps we should skip them on 32bit? |
||
| static ALWAYS_INLINE() int rstring_memcmp(const char *str, const char *rptr, const long length) | ||
| { | ||
| long rstring_length = RSTRING_LEN(rstring); | ||
| if (length == rstring_length) { | ||
| return memcmp(str, RSTRING_PTR(rstring), length); | ||
| } else { | ||
| long i = 0; | ||
|
|
||
| for (; i+8 <= length; i += 8) { | ||
| uint64_t a, b; | ||
| memcpy(&a, str + i, 8); | ||
| memcpy(&b, rptr + i, 8); | ||
| if (a != b) { | ||
| a = __builtin_bswap64(a); | ||
| b = __builtin_bswap64(b); | ||
| return (a < b) ? -1 : 1; | ||
| } | ||
|
Comment on lines
+96
to
+100
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The strings are ordered lexicographically in the cache. We need to reverse the bytes in
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we care, do we? As long as the ordering is consistent, it's fine.
Comment on lines
+92
to
+100
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. looking at godbolt ASM, seems like this is exactly the code clang generates when So I think we could just replace that code with something like: for (; i+8 <= length; i += 8) {
int cmp = memcmp(str + i, rptr + i, 8);
if (cmp) {
return cmp;
}
}
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interestingly, GCC doesn't: https://godbolt.org/z/7fTT6GzMs |
||
| } | ||
|
|
||
| for (; i < length; i++) { | ||
| unsigned char ca = (unsigned char)str[i]; | ||
| unsigned char cb = (unsigned char)rptr[i]; | ||
| if (ca != cb) { | ||
| return (ca < cb) ? -1 : 1; | ||
| } | ||
| } | ||
|
|
||
| return 0; | ||
| } | ||
| #else | ||
| #define rstring_memcmp memcmp | ||
| #endif | ||
|
|
||
| static ALWAYS_INLINE() int rstring_cache_cmp(const char *str, const long length, VALUE rstring) | ||
| { | ||
| const char *rptr; | ||
| long rstring_length; | ||
|
|
||
| RSTRING_GETMEM(rstring, rptr, rstring_length); | ||
|
|
||
| if (length != rstring_length) { | ||
| return (int)(length - rstring_length); | ||
| } | ||
|
|
||
| return rstring_memcmp(str, rptr, length); | ||
| } | ||
|
|
||
| static VALUE rstring_cache_fetch(rvalue_cache *cache, const char *str, const long length) | ||
| static ALWAYS_INLINE() VALUE rstring_cache_fetch(rvalue_cache *cache, const char *str, const long length) | ||
| { | ||
| int low = 0; | ||
| int high = cache->length - 1; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the profile in
samplyshows we spend a bit of time inmemcmp. Comparing 8 bytes at a time is helpful assuming memcmp isn't already doing that. Even ifmemcmpis optimized but we're only comparing strings less thanJSON_RVALUE_CACHE_MAX_ENTRY_LENGTH(55 bytes) and this removes the function all overhead.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected compilers to essentially inline
memcmp, like they do withmemcpyetc, but perhaps not?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: I refactored that code a bit.