Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 40 additions & 6 deletions ext/json/ext/parser/parser.c
Original file line number Diff line number Diff line change
Expand Up @@ -84,17 +84,51 @@ static void rvalue_cache_insert_at(rvalue_cache *cache, int index, VALUE rstring
cache->entries[index] = rstring;
}

static inline int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
#if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) && defined(__has_builtin) && __has_builtin(__builtin_bswap64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the profile in samply shows we spend a bit of time in memcmp. Comparing 8 bytes at a time is helpful assuming memcmp isn't already doing that. Even if memcmp is optimized but we're only comparing strings less than JSON_RVALUE_CACHE_MAX_ENTRY_LENGTH (55 bytes) and this removes the function all overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected compilers to essentially inline memcmp, like they do with memcpy etc, but perhaps not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: I refactored that code a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also applies to previous SWAR optimizations:

We gated these optimizations to Little Endian, which is fine, but they also assume 64bit arch, perhaps we should skip them on 32bit?

static ALWAYS_INLINE() int rstring_memcmp(const char *str, const char *rptr, const long length)
{
long rstring_length = RSTRING_LEN(rstring);
if (length == rstring_length) {
return memcmp(str, RSTRING_PTR(rstring), length);
} else {
long i = 0;

for (; i+8 <= length; i += 8) {
uint64_t a, b;
memcpy(&a, str + i, 8);
memcpy(&b, rptr + i, 8);
if (a != b) {
a = __builtin_bswap64(a);
b = __builtin_bswap64(b);
return (a < b) ? -1 : 1;
}
Comment on lines +96 to +100
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strings are ordered lexicographically in the cache. We need to reverse the bytes in a and b to ensure this is compatible with a lexicographic ordering of byte-by-byte comparisons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we care, do we? As long as the ordering is consistent, it's fine.

Comment on lines +92 to +100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at godbolt ASM, seems like this is exactly the code clang generates when memcmp is called with a constant len.

So I think we could just replace that code with something like:

     for (; i+8 <= length; i += 8) {
        int cmp = memcmp(str + i, rptr + i, 8);
        if (cmp) {
            return cmp;
        }
    }

Copy link
Contributor Author

@samyron samyron Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, GCC doesn't: https://godbolt.org/z/7fTT6GzMs

}

for (; i < length; i++) {
unsigned char ca = (unsigned char)str[i];
unsigned char cb = (unsigned char)rptr[i];
if (ca != cb) {
return (ca < cb) ? -1 : 1;
}
}

return 0;
}
#else
#define rstring_memcmp memcmp
#endif

static ALWAYS_INLINE() int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
{
const char *rptr;
long rstring_length;

RSTRING_GETMEM(rstring, rptr, rstring_length);

if (length != rstring_length) {
return (int)(length - rstring_length);
}

return rstring_memcmp(str, rptr, length);
}

static VALUE rstring_cache_fetch(rvalue_cache *cache, const char *str, const long length)
static ALWAYS_INLINE() VALUE rstring_cache_fetch(rvalue_cache *cache, const char *str, const long length)
{
int low = 0;
int high = cache->length - 1;
Expand Down
Loading