Skip to content

Conversation

@samyron
Copy link
Contributor

@samyron samyron commented Nov 3, 2025

This PR has a collection of small optimizations and simplifications. I'm happy to split this in to multiple PRs if you'd prefer.

I will leave comments inline to describe the changes.

Additionally it fixes a build issue. power_assert 3.0.0 was released that breaks CI as it drops support for Ruby < 3.1.

Benchmark

M1 Macbook Air:

Run 1

== Parsing activitypub.json (58160 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.020k i/100ms
Calculating -------------------------------------
               after     10.287k (± 0.6%) i/s   (97.21 μs/i) -     52.020k in   5.057109s

Comparison:
              before:     9434.5 i/s
               after:    10286.9 i/s - 1.09x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   104.000 i/100ms
Calculating -------------------------------------
               after      1.052k (± 1.0%) i/s  (950.16 μs/i) -      5.304k in   5.040276s

Comparison:
              before:      913.5 i/s
               after:     1052.5 i/s - 1.15x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    57.000 i/100ms
Calculating -------------------------------------
               after    581.994 (± 1.0%) i/s    (1.72 ms/i) -      2.964k in   5.093415s

Comparison:
              before:      521.9 i/s
               after:      582.0 i/s - 1.12x  faster


== Parsing ohai.json (32444 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.147k i/100ms
Calculating -------------------------------------
               after     11.481k (± 0.7%) i/s   (87.10 μs/i) -     58.497k in   5.095205s

Comparison:
              before:    10669.7 i/s
               after:    11481.4 i/s - 1.08x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    21.000 i/100ms
Calculating -------------------------------------
               after    196.936 (± 1.0%) i/s    (5.08 ms/i) -    987.000 in   5.012114s

Comparison:
              before:      189.1 i/s
               after:      196.9 i/s - 1.04x  faster

Run 2

== Parsing activitypub.json (58160 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.020k i/100ms
Calculating -------------------------------------
               after     10.275k (± 0.6%) i/s   (97.32 μs/i) -     52.020k in   5.062860s

Comparison:
              before:     9410.0 i/s
               after:    10275.2 i/s - 1.09x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   105.000 i/100ms
Calculating -------------------------------------
               after      1.058k (± 0.8%) i/s  (944.75 μs/i) -      5.355k in   5.059400s

Comparison:
              before:      912.5 i/s
               after:     1058.5 i/s - 1.16x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    58.000 i/100ms
Calculating -------------------------------------
               after    584.364 (± 0.7%) i/s    (1.71 ms/i) -      2.958k in   5.062116s

Comparison:
              before:      519.8 i/s
               after:      584.4 i/s - 1.12x  faster


== Parsing ohai.json (32444 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.143k i/100ms
Calculating -------------------------------------
               after     11.442k (± 0.8%) i/s   (87.40 μs/i) -     58.293k in   5.094970s

Comparison:
              before:    10579.9 i/s
               after:    11442.0 i/s - 1.08x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    21.000 i/100ms
Calculating -------------------------------------
               after    197.566 (± 0.5%) i/s    (5.06 ms/i) -      1.008k in   5.102322s

Comparison:
              before:      189.9 i/s
               after:      197.6 i/s - 1.04x  faster

Macbook Pro M4 Pro

Run 1

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.475k i/100ms
Calculating -------------------------------------
               after     15.305k (± 2.0%) i/s   (65.34 μs/i) -     76.700k in   5.013415s

Comparison:
              before:    14124.8 i/s
               after:    15305.2 i/s - 1.08x  faster


== Parsing activitypub-pretty.json (65761 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.845k i/100ms
Calculating -------------------------------------
               after     18.515k (± 1.4%) i/s   (54.01 μs/i) -     94.095k in   5.083162s

Comparison:
              before:    16615.8 i/s
               after:    18514.7 i/s - 1.11x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   165.000 i/100ms
Calculating -------------------------------------
               after      1.659k (± 1.4%) i/s  (602.91 μs/i) -      8.415k in   5.074625s

Comparison:
              before:     1435.5 i/s
               after:     1658.6 i/s - 1.16x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    89.000 i/100ms
Calculating -------------------------------------
               after    885.559 (± 4.1%) i/s    (1.13 ms/i) -      4.450k in   5.034525s

Comparison:
              before:      782.7 i/s
               after:      885.6 i/s - 1.13x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    31.000 i/100ms
Calculating -------------------------------------
               after    281.147 (± 5.3%) i/s    (3.56 ms/i) -      1.426k in   5.090406s

Comparison:
              before:      270.7 i/s
               after:      281.1 i/s - same-ish: difference falls within error

Run 2

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.533k i/100ms
Calculating -------------------------------------
               after     15.341k (± 2.9%) i/s   (65.18 μs/i) -     76.650k in   5.001380s

Comparison:
              before:    14198.7 i/s
               after:    15341.0 i/s - 1.08x  faster


== Parsing activitypub-pretty.json (65761 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.856k i/100ms
Calculating -------------------------------------
               after     18.541k (± 1.6%) i/s   (53.93 μs/i) -     92.800k in   5.006454s

Comparison:
              before:    16703.2 i/s
               after:    18541.0 i/s - 1.11x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   168.000 i/100ms
Calculating -------------------------------------
               after      1.653k (± 1.5%) i/s  (605.02 μs/i) -      8.400k in   5.083324s

Comparison:
              before:     1462.3 i/s
               after:     1652.8 i/s - 1.13x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    89.000 i/100ms
Calculating -------------------------------------
               after    888.315 (± 1.7%) i/s    (1.13 ms/i) -      4.450k in   5.010936s

Comparison:
              before:      782.0 i/s
               after:      888.3 i/s - 1.14x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    31.000 i/100ms
Calculating -------------------------------------
               after    284.734 (± 1.4%) i/s    (3.51 ms/i) -      1.426k in   5.009035s

Comparison:
              before:      271.3 i/s
               after:      284.7 i/s - 1.05x  faster

### Run 3

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.564k i/100ms
Calculating -------------------------------------
               after     15.543k (± 2.0%) i/s   (64.34 μs/i) -     78.200k in   5.033534s

Comparison:
              before:    14129.4 i/s
               after:    15542.6 i/s - 1.10x  faster


== Parsing activitypub-pretty.json (65761 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.874k i/100ms
Calculating -------------------------------------
               after     18.672k (± 1.7%) i/s   (53.56 μs/i) -     93.700k in   5.019765s

Comparison:
              before:    17049.8 i/s
               after:    18671.9 i/s - 1.10x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   167.000 i/100ms
Calculating -------------------------------------
               after      1.682k (± 1.5%) i/s  (594.51 μs/i) -      8.517k in   5.064703s

Comparison:
              before:     1474.1 i/s
               after:     1682.1 i/s - 1.14x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    88.000 i/100ms
Calculating -------------------------------------
               after    886.406 (± 2.0%) i/s    (1.13 ms/i) -      4.488k in   5.065409s

Comparison:
              before:      791.5 i/s
               after:      886.4 i/s - 1.12x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    30.000 i/100ms
Calculating -------------------------------------
               after    287.255 (± 1.4%) i/s    (3.48 ms/i) -      1.440k in   5.014173s

Comparison:
              before:      274.7 i/s
               after:      287.3 i/s - 1.05x  faster

Intel(R) Core(TM) i7-8850H laptop

Run 1

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after   609.000 i/100ms
Calculating -------------------------------------
               after      6.256k (± 3.2%) i/s  (159.83 μs/i) -     31.668k in   5.066829s

Comparison:
              before:     5665.8 i/s
               after:     6256.5 i/s - same-ish: difference falls within error


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after    54.000 i/100ms
Calculating -------------------------------------
               after    555.934 (± 2.2%) i/s    (1.80 ms/i) -      2.808k in   5.053412s

Comparison:
              before:      501.3 i/s
               after:      555.9 i/s - 1.11x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after    28.000 i/100ms
Calculating -------------------------------------
               after    281.890 (± 1.8%) i/s    (3.55 ms/i) -      1.428k in   5.067335s

Comparison:
              before:      261.5 i/s
               after:      281.9 i/s - 1.08x  faster


== Parsing ohai.json (32444 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after   534.000 i/100ms
Calculating -------------------------------------
               after      5.212k (± 3.2%) i/s  (191.87 μs/i) -     26.166k in   5.025585s

Comparison:
              before:     5082.8 i/s
               after:     5212.0 i/s - same-ish: difference falls within error

Run 2

ruby: warning: Ruby was built without YJIT support. You may need to install rustc to build Ruby with YJIT.
== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after   632.000 i/100ms
Calculating -------------------------------------
               after      6.162k (± 2.9%) i/s  (162.29 μs/i) -     30.968k in   5.029989s

Comparison:
              before:     5912.5 i/s
               after:     6161.8 i/s - same-ish: difference falls within error


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after    56.000 i/100ms
Calculating -------------------------------------
               after    555.719 (± 2.2%) i/s    (1.80 ms/i) -      2.800k in   5.040941s

Comparison:
              before:      495.9 i/s
               after:      555.7 i/s - 1.12x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after    28.000 i/100ms
Calculating -------------------------------------
               after    281.597 (± 2.8%) i/s    (3.55 ms/i) -      1.428k in   5.074822s

Comparison:
              before:      268.3 i/s
               after:      281.6 i/s - same-ish: difference falls within error


== Parsing ohai.json (32444 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after   526.000 i/100ms
Calculating -------------------------------------
               after      4.981k (± 6.3%) i/s  (200.77 μs/i) -     25.248k in   5.089681s

Comparison:
              before:     4914.8 i/s
               after:     4980.8 i/s - same-ish: difference falls within error

}

static inline int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
static inline FORCE_INLINE int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called in a hot loop in rstring_cache_fetch. Forcing it inline helps a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an idea, does reducing the number of rstring_cache_cmp calls improve performance?
Like fixed-size hash tables. It only needs single hash calculation and single compare.
The drawback is that it doesn't guarantee first 63 keys(JSON_RVALUE_CACHE_CAPA) to be always cached.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tompng I'm not sure I exactly understand what you are suggesting.

I used a sorted array over a hash table so that in most case we can avoid hashing the string. But I'm open to other implementations if it's not too complex and it performs better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that was my misunderstanding, and my benchmark string (4..10 lenghth string) was too short...

Copy link
Contributor Author

@samyron samyron Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did partially implement a hashtable on a different branch. See rstring_ht_fetch.

This was a very simple hashtable using FNV1a to hash the string and open addressing. Note that FNV1a was faster than rb_memhash on my machine for these strings. My testing wasn't exhaustive. Edit: My guess is FNV1a was faster because the strings were short enough and/or the FNV1a code was inlined and rb_memhash wasn't.

I limited the probe length length so as to prevent any maliciously crafted inputs from devolving the performance to O(n).

static inline int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
static inline FORCE_INLINE int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
{
#if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) && defined(__has_builtin) && __has_builtin(__builtin_bswap64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the profile in samply shows we spend a bit of time in memcmp. Comparing 8 bytes at a time is helpful assuming memcmp isn't already doing that. Even if memcmp is optimized but we're only comparing strings less than JSON_RVALUE_CACHE_MAX_ENTRY_LENGTH (55 bytes) and this removes the function all overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected compilers to essentially inline memcmp, like they do with memcpy etc, but perhaps not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: I refactored that code a bit.

Comment on lines +144 to +100
if (a != b) {
a = __builtin_bswap64(a);
b = __builtin_bswap64(b);
return (a < b) ? -1 : 1;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strings are ordered lexicographically in the cache. We need to reverse the bytes in a and b to ensure this is compatible with a lexicographic ordering of byte-by-byte comparisons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we care, do we? As long as the ordering is consistent, it's fine.

Comment on lines 164 to 130
if (RB_UNLIKELY(memchr(str, '\\', length))) {
// We assume the overwhelming majority of names don't need to be escaped.
// But if they do, we have to fallback to the slow path.
return Qfalse;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can ever evaluate to true. This is called from two places: json_string_fastpath and json_string_unescape. The only place both of those are called is in json_decode_string.

static inline VALUE json_decode_string(JSON_ParserState *state, JSON_ParserConfig *config, const char *start, const char *end, bool escaped, bool is_name)
{
    VALUE string;
    bool intern = is_name || config->freeze;
    bool symbolize = is_name && config->symbolize_names;
    if (escaped) {
        string = json_string_unescape(state, start, end, is_name, intern, symbolize);
    } else {
        string = json_string_fastpath(state, start, end, is_name, intern, symbolize);
    }

    return string;
}

Since we know that json_string_fastpath is only called when there are no escapes, this shouldn't be necessary.

Additionally, we should remove the rstring_cache_fetch call in json_string_unescape as we know those strings shouldn't be cached.

@samyron samyron force-pushed the sm/parser-misc-optimizations branch from c92bd6a to 5c7e741 Compare November 3, 2025 03:53
Comment on lines 694 to 667
if (is_name && state->in_array) {
VALUE cached_key;
if (RB_UNLIKELY(symbolize)) {
cached_key = rsymbol_cache_fetch(&state->name_cache, string, bufferSize);
} else {
cached_key = rstring_cache_fetch(&state->name_cache, string, bufferSize);
}

if (RB_LIKELY(cached_key)) {
return cached_key;
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rstring_cache_fetch used to check for escapes and return Qfalse if a \ was found. I believe that means it would only ever return Qfalse in this method.

@byroot
Copy link
Member

byroot commented Nov 3, 2025

I'm happy to split this in to multiple PRs if you'd prefer.

It doesn't have to be multiple PRs, but at least multiple commits would be good. It would allow to describe each change individually, and measure the gain as well.

@byroot byroot force-pushed the sm/parser-misc-optimizations branch 2 times, most recently from 11037ad to 441d1ae Compare November 3, 2025 09:23
@byroot
Copy link
Member

byroot commented Nov 3, 2025

NB: I force pushed you branch with a couple cleanups.

@byroot byroot force-pushed the sm/parser-misc-optimizations branch 3 times, most recently from 04a6124 to a4e99bf Compare November 3, 2025 10:33
}

static inline int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
#if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) && defined(__has_builtin) && __has_builtin(__builtin_bswap64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also applies to previous SWAR optimizations:

We gated these optimizations to Little Endian, which is fine, but they also assume 64bit arch, perhaps we should skip them on 32bit?

Comment on lines +86 to +100
for (; i+8 <= length; i += 8) {
uint64_t a, b;
memcpy(&a, str + i, 8);
memcpy(&b, rptr + i, 8);
if (a != b) {
a = __builtin_bswap64(a);
b = __builtin_bswap64(b);
return (a < b) ? -1 : 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at godbolt ASM, seems like this is exactly the code clang generates when memcmp is called with a constant len.

So I think we could just replace that code with something like:

     for (; i+8 <= length; i += 8) {
        int cmp = memcmp(str + i, rptr + i, 8);
        if (cmp) {
            return cmp;
        }
    }

Copy link
Contributor Author

@samyron samyron Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, GCC doesn't: https://godbolt.org/z/7fTT6GzMs

@byroot byroot force-pushed the sm/parser-misc-optimizations branch from a4e99bf to cb6391e Compare November 3, 2025 13:12
byroot added a commit to byroot/json that referenced this pull request Nov 4, 2025
Closes: ruby#888

- Mark it as `inline`.
- Use `RSTRING_GETMEM`, instead of `RSTRING_LEN` and `RSTRING_PTR`.
- Use an inlinable version of `memcmp`.

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:    11766.6 i/s
               after:    12272.1 i/s - 1.04x  faster

== Parsing twitter.json (567916 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:     1333.2 i/s
               after:     1422.0 i/s - 1.07x  faster

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:      656.3 i/s
               after:      673.1 i/s - 1.03x  faster

== Parsing float parsing (2251051 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:      276.8 i/s
               after:      276.4 i/s - same-ish: difference falls within error
```

Co-Authored-By: Scott Myron <[email protected]>
@byroot byroot closed this in #891 Nov 4, 2025
byroot added a commit that referenced this pull request Nov 4, 2025
Closes: #888

- Mark it as `inline`.
- Use `RSTRING_GETMEM`, instead of `RSTRING_LEN` and `RSTRING_PTR`.
- Use an inlinable version of `memcmp`.

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:    11766.6 i/s
               after:    12272.1 i/s - 1.04x  faster

== Parsing twitter.json (567916 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:     1333.2 i/s
               after:     1422.0 i/s - 1.07x  faster

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:      656.3 i/s
               after:      673.1 i/s - 1.03x  faster

== Parsing float parsing (2251051 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:      276.8 i/s
               after:      276.4 i/s - same-ish: difference falls within error
```

Co-Authored-By: Scott Myron <[email protected]>
matzbot pushed a commit to ruby/ruby that referenced this pull request Nov 4, 2025
Closes: ruby/json#888

- Mark it as `inline`.
- Use `RSTRING_GETMEM`, instead of `RSTRING_LEN` and `RSTRING_PTR`.
- Use an inlinable version of `memcmp`.

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:    11766.6 i/s
               after:    12272.1 i/s - 1.04x  faster

== Parsing twitter.json (567916 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:     1333.2 i/s
               after:     1422.0 i/s - 1.07x  faster

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:      656.3 i/s
               after:      673.1 i/s - 1.03x  faster

== Parsing float parsing (2251051 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:      276.8 i/s
               after:      276.4 i/s - same-ish: difference falls within error
```

ruby/json@a67d1a1af4

Co-Authored-By: Scott Myron <[email protected]>
@byroot
Copy link
Member

byroot commented Nov 4, 2025

Thanks, all these small optimizations have been merged in some reworked form.

The rstring_cache_fetch hotspot is interesting, yesterday I had an idea I'd like to explore.

In the common case where we have an array of similar objects:

[
  {"foo": 1, "bar": 2},
  {"foo": 3, "bar": 4}
]

I'm wondering if we could keep the first parsed Hash around and before going to the string cache, we could first check if the current key we're parsing matches the first object's key at the same position.

This however is contingent on having an efficient way to access the Hash keys by position, because if we need to call Hash#keys I fear the extra array alloc may cost more than we'd save.

@byroot
Copy link
Member

byroot commented Nov 4, 2025

This however is contingent on having an efficient way to access the Hash keys by position

Actually, we could get them for the rvalue_stack just before we create the first hash.

@byroot
Copy link
Member

byroot commented Nov 4, 2025

Looking at twitter.json, there seem to be 4 distinct sets of keys:

>> data["statuses"].map(&:keys).tally.each { |a, c| p a; p c }; nil
["metadata", "created_at", "id", "id_str", "text", "source", "truncated", "in_reply_to_status_id", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_user_id_str", "in_reply_to_screen_name", "user", "geo", "coordinates", "place", "contributors", "retweet_count", "favorite_count", "entities", "favorited", "retweeted", "lang"]
20
["metadata", "created_at", "id", "id_str", "text", "source", "truncated", "in_reply_to_status_id", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_user_id_str", "in_reply_to_screen_name", "user", "geo", "coordinates", "place", "contributors", "retweeted_status", "retweet_count", "favorite_count", "entities", "favorited", "retweeted", "possibly_sensitive", "lang"]
8
["metadata", "created_at", "id", "id_str", "text", "source", "truncated", "in_reply_to_status_id", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_user_id_str", "in_reply_to_screen_name", "user", "geo", "coordinates", "place", "contributors", "retweeted_status", "retweet_count", "favorite_count", "entities", "favorited", "retweeted", "lang"]
65
["metadata", "created_at", "id", "id_str", "text", "source", "truncated", "in_reply_to_status_id", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_user_id_str", "in_reply_to_screen_name", "user", "geo", "coordinates", "place", "contributors", "retweet_count", "favorite_count", "entities", "favorited", "retweeted", "possibly_sensitive", "lang"]
7

However, the first 17 keys are always the same, so I think there's potential there. We can stop checking after the first mismatch, that would still be 17 out of 23-25 keys (68-73%) that could be obtained with a single memcmp.

As for activitypub all 20 entries have the same keys:

>> data["orderedItems"].map(&:keys).tally
=> {["id", "type", "actor", "published", "to", "cc", "object"] => 20}

And for citm_catalog:

>> data["performances"].map(&:keys).tally
=> {["eventId", "id", "logo", "name", "prices", "seatCategories", "seatMapImage", "start", "venueCode"] => 243}

What I'm less clear on is for nested hashes, e.g. in twitter.json:

{
  "statuses": [
    {
      "metadata": {
        "result_type": "recent",
        "iso_language_code": "ja"
      },
      "created_at": "Sun Aug 31 00:29:15 +0000 2014",
      "id": 505874924095815700,

Here I'm not sure how we could provide the keys for statuses[0].metadata, but just supporting the first level might already be a decent win.

@samyron
Copy link
Contributor Author

samyron commented Nov 4, 2025

I had a similar idea but I wasn't sure how to implement it, so I filed it in the "things to think about while I workout" bucket.

Sketching out my current idea...

Define a new type to keep track of the index keys in their insertion order. Limit this to a relatively small number.

#define JSON_INDEX_KEYS_CAPA 32
typedef struct _json_object_keys {
  VALUE entries[JSON_INDEX_KEYS_CAPA];
   int len;
} JSON_ObjectKeys;

Add the following to JSON_ParserState:

JSON_ObjectKeys *object_keys;

Since json_parse_any in recursive, we can take advantage of the callstack to keep track of the previous value of object_keys.

When we hit a [, we do something like:

        case '[': {
            state->cursor++;
            json_eat_whitespace(state);
            long stack_head = state->stack->head;

            JSON_ObjectKeys current;
            current.length = 0;

            JSON_ObjectKeys *previous_keys = state->object_keys;

            if (peek(state) == ']') {
                state->cursor++;
                return json_push_value(state, config, json_decode_array(state, config, 0));
            } else {
                state->current_nesting++;
                if (RB_UNLIKELY(config->max_nesting && (config->max_nesting < state->current_nesting))) {
                    rb_raise(eNestingError, "nesting of %d is too deep", state->current_nesting);
                }
                state->in_array++;

                // Only set the object_keys if we have at least one element.
                state->object_keys = &current;

                json_parse_any(state, config);
            }

            while (true) {
                json_eat_whitespace(state);

                const char next_char = peek(state);

                if (RB_LIKELY(next_char == ',')) {
                    state->cursor++;
                    if (config->allow_trailing_comma) {
                        json_eat_whitespace(state);
                        if (peek(state) == ']') {
                            continue;
                        }
                    }
                    json_parse_any(state, config);
                    continue;
                }

                if (next_char == ']') {
                    state->cursor++;
                    long count = state->stack->head - stack_head;
                    state->current_nesting--;
                    state->in_array--;

                     state->object_keys = previous_keys;

                    return json_push_value(state, config, json_decode_array(state, config, count));
                }

                raise_parse_error("expected ',' or ']' after array value", state);
            }
            break;
        }

We then probably need to keep track of the array index we're currently parsing.. if it's array_index > 1 we can check the object_keys when parsing an object.

The object parsing code needs to change too, if we're on array_index == 0, we add keys (in order) to the object_keys. For any additional object we can check the object_keys in order of reading them.

That's about as far as I've made it...

jacob-shops pushed a commit to Shopify/ruby that referenced this pull request Nov 12, 2025
Closes: ruby/json#888

- Mark it as `inline`.
- Use `RSTRING_GETMEM`, instead of `RSTRING_LEN` and `RSTRING_PTR`.
- Use an inlinable version of `memcmp`.

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:    11766.6 i/s
               after:    12272.1 i/s - 1.04x  faster

== Parsing twitter.json (567916 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:     1333.2 i/s
               after:     1422.0 i/s - 1.07x  faster

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:      656.3 i/s
               after:      673.1 i/s - 1.03x  faster

== Parsing float parsing (2251051 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Comparison:
              before:      276.8 i/s
               after:      276.4 i/s - same-ish: difference falls within error
```

ruby/json@a67d1a1af4

Co-Authored-By: Scott Myron <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants