parser.c: various optimizations and simplifications #888

samyron · 2025-11-03T03:20:33Z

This PR has a collection of small optimizations and simplifications. I'm happy to split this in to multiple PRs if you'd prefer.

I will leave comments inline to describe the changes.

Additionally it fixes a build issue. power_assert 3.0.0 was released that breaks CI as it drops support for Ruby < 3.1.

Benchmark

M1 Macbook Air:

Run 1

== Parsing activitypub.json (58160 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.020k i/100ms
Calculating -------------------------------------
               after     10.287k (± 0.6%) i/s   (97.21 μs/i) -     52.020k in   5.057109s

Comparison:
              before:     9434.5 i/s
               after:    10286.9 i/s - 1.09x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   104.000 i/100ms
Calculating -------------------------------------
               after      1.052k (± 1.0%) i/s  (950.16 μs/i) -      5.304k in   5.040276s

Comparison:
              before:      913.5 i/s
               after:     1052.5 i/s - 1.15x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    57.000 i/100ms
Calculating -------------------------------------
               after    581.994 (± 1.0%) i/s    (1.72 ms/i) -      2.964k in   5.093415s

Comparison:
              before:      521.9 i/s
               after:      582.0 i/s - 1.12x  faster


== Parsing ohai.json (32444 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.147k i/100ms
Calculating -------------------------------------
               after     11.481k (± 0.7%) i/s   (87.10 μs/i) -     58.497k in   5.095205s

Comparison:
              before:    10669.7 i/s
               after:    11481.4 i/s - 1.08x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    21.000 i/100ms
Calculating -------------------------------------
               after    196.936 (± 1.0%) i/s    (5.08 ms/i) -    987.000 in   5.012114s

Comparison:
              before:      189.1 i/s
               after:      196.9 i/s - 1.04x  faster

Run 2

== Parsing activitypub.json (58160 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.020k i/100ms
Calculating -------------------------------------
               after     10.275k (± 0.6%) i/s   (97.32 μs/i) -     52.020k in   5.062860s

Comparison:
              before:     9410.0 i/s
               after:    10275.2 i/s - 1.09x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   105.000 i/100ms
Calculating -------------------------------------
               after      1.058k (± 0.8%) i/s  (944.75 μs/i) -      5.355k in   5.059400s

Comparison:
              before:      912.5 i/s
               after:     1058.5 i/s - 1.16x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    58.000 i/100ms
Calculating -------------------------------------
               after    584.364 (± 0.7%) i/s    (1.71 ms/i) -      2.958k in   5.062116s

Comparison:
              before:      519.8 i/s
               after:      584.4 i/s - 1.12x  faster


== Parsing ohai.json (32444 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.143k i/100ms
Calculating -------------------------------------
               after     11.442k (± 0.8%) i/s   (87.40 μs/i) -     58.293k in   5.094970s

Comparison:
              before:    10579.9 i/s
               after:    11442.0 i/s - 1.08x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    21.000 i/100ms
Calculating -------------------------------------
               after    197.566 (± 0.5%) i/s    (5.06 ms/i) -      1.008k in   5.102322s

Comparison:
              before:      189.9 i/s
               after:      197.6 i/s - 1.04x  faster

Macbook Pro M4 Pro

Run 1

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.475k i/100ms
Calculating -------------------------------------
               after     15.305k (± 2.0%) i/s   (65.34 μs/i) -     76.700k in   5.013415s

Comparison:
              before:    14124.8 i/s
               after:    15305.2 i/s - 1.08x  faster


== Parsing activitypub-pretty.json (65761 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.845k i/100ms
Calculating -------------------------------------
               after     18.515k (± 1.4%) i/s   (54.01 μs/i) -     94.095k in   5.083162s

Comparison:
              before:    16615.8 i/s
               after:    18514.7 i/s - 1.11x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   165.000 i/100ms
Calculating -------------------------------------
               after      1.659k (± 1.4%) i/s  (602.91 μs/i) -      8.415k in   5.074625s

Comparison:
              before:     1435.5 i/s
               after:     1658.6 i/s - 1.16x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    89.000 i/100ms
Calculating -------------------------------------
               after    885.559 (± 4.1%) i/s    (1.13 ms/i) -      4.450k in   5.034525s

Comparison:
              before:      782.7 i/s
               after:      885.6 i/s - 1.13x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    31.000 i/100ms
Calculating -------------------------------------
               after    281.147 (± 5.3%) i/s    (3.56 ms/i) -      1.426k in   5.090406s

Comparison:
              before:      270.7 i/s
               after:      281.1 i/s - same-ish: difference falls within error

Run 2

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.533k i/100ms
Calculating -------------------------------------
               after     15.341k (± 2.9%) i/s   (65.18 μs/i) -     76.650k in   5.001380s

Comparison:
              before:    14198.7 i/s
               after:    15341.0 i/s - 1.08x  faster


== Parsing activitypub-pretty.json (65761 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.856k i/100ms
Calculating -------------------------------------
               after     18.541k (± 1.6%) i/s   (53.93 μs/i) -     92.800k in   5.006454s

Comparison:
              before:    16703.2 i/s
               after:    18541.0 i/s - 1.11x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   168.000 i/100ms
Calculating -------------------------------------
               after      1.653k (± 1.5%) i/s  (605.02 μs/i) -      8.400k in   5.083324s

Comparison:
              before:     1462.3 i/s
               after:     1652.8 i/s - 1.13x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    89.000 i/100ms
Calculating -------------------------------------
               after    888.315 (± 1.7%) i/s    (1.13 ms/i) -      4.450k in   5.010936s

Comparison:
              before:      782.0 i/s
               after:      888.3 i/s - 1.14x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    31.000 i/100ms
Calculating -------------------------------------
               after    284.734 (± 1.4%) i/s    (3.51 ms/i) -      1.426k in   5.009035s

Comparison:
              before:      271.3 i/s
               after:      284.7 i/s - 1.05x  faster

### Run 3

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.564k i/100ms
Calculating -------------------------------------
               after     15.543k (± 2.0%) i/s   (64.34 μs/i) -     78.200k in   5.033534s

Comparison:
              before:    14129.4 i/s
               after:    15542.6 i/s - 1.10x  faster


== Parsing activitypub-pretty.json (65761 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.874k i/100ms
Calculating -------------------------------------
               after     18.672k (± 1.7%) i/s   (53.56 μs/i) -     93.700k in   5.019765s

Comparison:
              before:    17049.8 i/s
               after:    18671.9 i/s - 1.10x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   167.000 i/100ms
Calculating -------------------------------------
               after      1.682k (± 1.5%) i/s  (594.51 μs/i) -      8.517k in   5.064703s

Comparison:
              before:     1474.1 i/s
               after:     1682.1 i/s - 1.14x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    88.000 i/100ms
Calculating -------------------------------------
               after    886.406 (± 2.0%) i/s    (1.13 ms/i) -      4.488k in   5.065409s

Comparison:
              before:      791.5 i/s
               after:      886.4 i/s - 1.12x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    30.000 i/100ms
Calculating -------------------------------------
               after    287.255 (± 1.4%) i/s    (3.48 ms/i) -      1.440k in   5.014173s

Comparison:
              before:      274.7 i/s
               after:      287.3 i/s - 1.05x  faster

Intel(R) Core(TM) i7-8850H laptop

Run 1

== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after   609.000 i/100ms
Calculating -------------------------------------
               after      6.256k (± 3.2%) i/s  (159.83 μs/i) -     31.668k in   5.066829s

Comparison:
              before:     5665.8 i/s
               after:     6256.5 i/s - same-ish: difference falls within error


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after    54.000 i/100ms
Calculating -------------------------------------
               after    555.934 (± 2.2%) i/s    (1.80 ms/i) -      2.808k in   5.053412s

Comparison:
              before:      501.3 i/s
               after:      555.9 i/s - 1.11x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after    28.000 i/100ms
Calculating -------------------------------------
               after    281.890 (± 1.8%) i/s    (3.55 ms/i) -      1.428k in   5.067335s

Comparison:
              before:      261.5 i/s
               after:      281.9 i/s - 1.08x  faster


== Parsing ohai.json (32444 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after   534.000 i/100ms
Calculating -------------------------------------
               after      5.212k (± 3.2%) i/s  (191.87 μs/i) -     26.166k in   5.025585s

Comparison:
              before:     5082.8 i/s
               after:     5212.0 i/s - same-ish: difference falls within error

Run 2

ruby: warning: Ruby was built without YJIT support. You may need to install rustc to build Ruby with YJIT.
== Parsing activitypub.json (58160 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after   632.000 i/100ms
Calculating -------------------------------------
               after      6.162k (± 2.9%) i/s  (162.29 μs/i) -     30.968k in   5.029989s

Comparison:
              before:     5912.5 i/s
               after:     6161.8 i/s - same-ish: difference falls within error


== Parsing twitter.json (567916 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after    56.000 i/100ms
Calculating -------------------------------------
               after    555.719 (± 2.2%) i/s    (1.80 ms/i) -      2.800k in   5.040941s

Comparison:
              before:      495.9 i/s
               after:      555.7 i/s - 1.12x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after    28.000 i/100ms
Calculating -------------------------------------
               after    281.597 (± 2.8%) i/s    (3.55 ms/i) -      1.428k in   5.074822s

Comparison:
              before:      268.3 i/s
               after:      281.6 i/s - same-ish: difference falls within error


== Parsing ohai.json (32444 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
               after   526.000 i/100ms
Calculating -------------------------------------
               after      4.981k (± 6.3%) i/s  (200.77 μs/i) -     25.248k in   5.089681s

Comparison:
              before:     4914.8 i/s
               after:     4980.8 i/s - same-ish: difference falls within error

samyron · 2025-11-03T03:21:56Z

ext/json/ext/parser/parser.c

 }

-static inline int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
+static inline FORCE_INLINE int rstring_cache_cmp(const char *str, const long length, VALUE rstring)


This method is called in a hot loop in rstring_cache_fetch. Forcing it inline helps a bit.

This is just an idea, does reducing the number of rstring_cache_cmp calls improve performance?
Like fixed-size hash tables. It only needs single hash calculation and single compare.
The drawback is that it doesn't guarantee first 63 keys(JSON_RVALUE_CACHE_CAPA) to be always cached.

@tompng I'm not sure I exactly understand what you are suggesting.

I used a sorted array over a hash table so that in most case we can avoid hashing the string. But I'm open to other implementations if it's not too complex and it performs better.

I see, that was my misunderstanding, and my benchmark string (4..10 lenghth string) was too short...

I did partially implement a hashtable on a different branch. See rstring_ht_fetch.

This was a very simple hashtable using FNV1a to hash the string and open addressing. Note that FNV1a was faster than rb_memhash on my machine for these strings. My testing wasn't exhaustive. Edit: My guess is FNV1a was faster because the strings were short enough and/or the FNV1a code was inlined and rb_memhash wasn't.

I limited the probe length length so as to prevent any maliciously crafted inputs from devolving the performance to O(n).

samyron · 2025-11-03T03:24:48Z

ext/json/ext/parser/parser.c

-static inline int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
+static inline FORCE_INLINE int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
 {
+#if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) && defined(__has_builtin) && __has_builtin(__builtin_bswap64)


Looking at the profile in samply shows we spend a bit of time in memcmp. Comparing 8 bytes at a time is helpful assuming memcmp isn't already doing that. Even if memcmp is optimized but we're only comparing strings less than JSON_RVALUE_CACHE_MAX_ENTRY_LENGTH (55 bytes) and this removes the function all overhead.

I would have expected compilers to essentially inline memcmp, like they do with memcpy etc, but perhaps not?

NB: I refactored that code a bit.

samyron · 2025-11-03T03:31:29Z

ext/json/ext/parser/parser.c

+        if (a != b) {
+            a = __builtin_bswap64(a);
+            b = __builtin_bswap64(b);
+            return (a < b) ? -1 : 1;
+        }


The strings are ordered lexicographically in the cache. We need to reverse the bytes in a and b to ensure this is compatible with a lexicographic ordering of byte-by-byte comparisons.

I don't think we care, do we? As long as the ordering is consistent, it's fine.

samyron · 2025-11-03T03:50:54Z

ext/json/ext/parser/parser.c

-    if (RB_UNLIKELY(memchr(str, '\\', length))) {
-        // We assume the overwhelming majority of names don't need to be escaped.
-        // But if they do, we have to fallback to the slow path.
-        return Qfalse;
-    }
-


I don't think this can ever evaluate to true. This is called from two places: json_string_fastpath and json_string_unescape. The only place both of those are called is in json_decode_string.

static inline VALUE json_decode_string(JSON_ParserState *state, JSON_ParserConfig *config, const char *start, const char *end, bool escaped, bool is_name) { VALUE string; bool intern = is_name || config->freeze; bool symbolize = is_name && config->symbolize_names; if (escaped) { string = json_string_unescape(state, start, end, is_name, intern, symbolize); } else { string = json_string_fastpath(state, start, end, is_name, intern, symbolize); } return string; }

Since we know that json_string_fastpath is only called when there are no escapes, this shouldn't be necessary.

Additionally, we should remove the rstring_cache_fetch call in json_string_unescape as we know those strings shouldn't be cached.

ext/json/ext/parser/parser.c

Gemfile

samyron · 2025-11-03T03:55:34Z

ext/json/ext/parser/parser.c

-    if (is_name && state->in_array) {
-        VALUE cached_key;
-        if (RB_UNLIKELY(symbolize)) {
-            cached_key = rsymbol_cache_fetch(&state->name_cache, string, bufferSize);
-        } else {
-            cached_key = rstring_cache_fetch(&state->name_cache, string, bufferSize);
-        }
-
-        if (RB_LIKELY(cached_key)) {
-            return cached_key;
-        }
-    }
-


The rstring_cache_fetch used to check for escapes and return Qfalse if a \ was found. I believe that means it would only ever return Qfalse in this method.

ext/json/ext/parser/parser.c

byroot · 2025-11-03T08:12:25Z

I'm happy to split this in to multiple PRs if you'd prefer.

It doesn't have to be multiple PRs, but at least multiple commits would be good. It would allow to describe each change individually, and measure the gain as well.

byroot · 2025-11-03T09:26:13Z

NB: I force pushed you branch with a couple cleanups.

byroot · 2025-11-03T10:49:05Z

ext/json/ext/parser/parser.c

 }

-static inline int rstring_cache_cmp(const char *str, const long length, VALUE rstring)
+#if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) && defined(__has_builtin) && __has_builtin(__builtin_bswap64)


Also applies to previous SWAR optimizations:

We gated these optimizations to Little Endian, which is fine, but they also assume 64bit arch, perhaps we should skip them on 32bit?

byroot · 2025-11-03T10:57:38Z

ext/json/ext/parser/parser.c

+    for (; i+8 <= length; i += 8) {
+        uint64_t a, b;
+        memcpy(&a, str + i, 8);
+        memcpy(&b, rptr + i, 8);
+        if (a != b) {
+            a = __builtin_bswap64(a);
+            b = __builtin_bswap64(b);
+            return (a < b) ? -1 : 1;
+        }


looking at godbolt ASM, seems like this is exactly the code clang generates when memcmp is called with a constant len.

So I think we could just replace that code with something like:

for (; i+8 <= length; i += 8) { int cmp = memcmp(str + i, rptr + i, 8); if (cmp) { return cmp; } }

Interestingly, GCC doesn't: https://godbolt.org/z/7fTT6GzMs

Closes: ruby#888 - Mark it as `inline`. - Use `RSTRING_GETMEM`, instead of `RSTRING_LEN` and `RSTRING_PTR`. - Use an inlinable version of `memcmp`. ``` == Parsing activitypub.json (58160 bytes) ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 11766.6 i/s after: 12272.1 i/s - 1.04x faster == Parsing twitter.json (567916 bytes) ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 1333.2 i/s after: 1422.0 i/s - 1.07x faster == Parsing citm_catalog.json (1727030 bytes) ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 656.3 i/s after: 673.1 i/s - 1.03x faster == Parsing float parsing (2251051 bytes) ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 276.8 i/s after: 276.4 i/s - same-ish: difference falls within error ``` Co-Authored-By: Scott Myron <[email protected]>

Closes: #888 - Mark it as `inline`. - Use `RSTRING_GETMEM`, instead of `RSTRING_LEN` and `RSTRING_PTR`. - Use an inlinable version of `memcmp`. ``` == Parsing activitypub.json (58160 bytes) ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 11766.6 i/s after: 12272.1 i/s - 1.04x faster == Parsing twitter.json (567916 bytes) ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 1333.2 i/s after: 1422.0 i/s - 1.07x faster == Parsing citm_catalog.json (1727030 bytes) ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 656.3 i/s after: 673.1 i/s - 1.03x faster == Parsing float parsing (2251051 bytes) ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 276.8 i/s after: 276.4 i/s - same-ish: difference falls within error ``` Co-Authored-By: Scott Myron <[email protected]>

Closes: ruby/json#888 - Mark it as `inline`. - Use `RSTRING_GETMEM`, instead of `RSTRING_LEN` and `RSTRING_PTR`. - Use an inlinable version of `memcmp`. ``` == Parsing activitypub.json (58160 bytes) ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 11766.6 i/s after: 12272.1 i/s - 1.04x faster == Parsing twitter.json (567916 bytes) ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 1333.2 i/s after: 1422.0 i/s - 1.07x faster == Parsing citm_catalog.json (1727030 bytes) ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 656.3 i/s after: 673.1 i/s - 1.03x faster == Parsing float parsing (2251051 bytes) ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 276.8 i/s after: 276.4 i/s - same-ish: difference falls within error ``` ruby/json@a67d1a1af4 Co-Authored-By: Scott Myron <[email protected]>

byroot · 2025-11-04T08:52:42Z

Thanks, all these small optimizations have been merged in some reworked form.

The rstring_cache_fetch hotspot is interesting, yesterday I had an idea I'd like to explore.

In the common case where we have an array of similar objects:

[
  {"foo": 1, "bar": 2},
  {"foo": 3, "bar": 4}
]

I'm wondering if we could keep the first parsed Hash around and before going to the string cache, we could first check if the current key we're parsing matches the first object's key at the same position.

This however is contingent on having an efficient way to access the Hash keys by position, because if we need to call Hash#keys I fear the extra array alloc may cost more than we'd save.

byroot · 2025-11-04T09:18:37Z

This however is contingent on having an efficient way to access the Hash keys by position

Actually, we could get them for the rvalue_stack just before we create the first hash.

byroot · 2025-11-04T09:46:41Z

Looking at twitter.json, there seem to be 4 distinct sets of keys:

>> data["statuses"].map(&:keys).tally.each { |a, c| p a; p c }; nil
["metadata", "created_at", "id", "id_str", "text", "source", "truncated", "in_reply_to_status_id", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_user_id_str", "in_reply_to_screen_name", "user", "geo", "coordinates", "place", "contributors", "retweet_count", "favorite_count", "entities", "favorited", "retweeted", "lang"]
20
["metadata", "created_at", "id", "id_str", "text", "source", "truncated", "in_reply_to_status_id", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_user_id_str", "in_reply_to_screen_name", "user", "geo", "coordinates", "place", "contributors", "retweeted_status", "retweet_count", "favorite_count", "entities", "favorited", "retweeted", "possibly_sensitive", "lang"]
8
["metadata", "created_at", "id", "id_str", "text", "source", "truncated", "in_reply_to_status_id", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_user_id_str", "in_reply_to_screen_name", "user", "geo", "coordinates", "place", "contributors", "retweeted_status", "retweet_count", "favorite_count", "entities", "favorited", "retweeted", "lang"]
65
["metadata", "created_at", "id", "id_str", "text", "source", "truncated", "in_reply_to_status_id", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_user_id_str", "in_reply_to_screen_name", "user", "geo", "coordinates", "place", "contributors", "retweet_count", "favorite_count", "entities", "favorited", "retweeted", "possibly_sensitive", "lang"]
7

However, the first 17 keys are always the same, so I think there's potential there. We can stop checking after the first mismatch, that would still be 17 out of 23-25 keys (68-73%) that could be obtained with a single memcmp.

As for activitypub all 20 entries have the same keys:

>> data["orderedItems"].map(&:keys).tally
=> {["id", "type", "actor", "published", "to", "cc", "object"] => 20}

And for citm_catalog:

>> data["performances"].map(&:keys).tally
=> {["eventId", "id", "logo", "name", "prices", "seatCategories", "seatMapImage", "start", "venueCode"] => 243}

What I'm less clear on is for nested hashes, e.g. in twitter.json:

{
  "statuses": [
    {
      "metadata": {
        "result_type": "recent",
        "iso_language_code": "ja"
      },
      "created_at": "Sun Aug 31 00:29:15 +0000 2014",
      "id": 505874924095815700,

Here I'm not sure how we could provide the keys for statuses[0].metadata, but just supporting the first level might already be a decent win.

samyron · 2025-11-04T16:48:59Z

I had a similar idea but I wasn't sure how to implement it, so I filed it in the "things to think about while I workout" bucket.

Sketching out my current idea...

Define a new type to keep track of the index keys in their insertion order. Limit this to a relatively small number.

#define JSON_INDEX_KEYS_CAPA 32
typedef struct _json_object_keys {
  VALUE entries[JSON_INDEX_KEYS_CAPA];
   int len;
} JSON_ObjectKeys;

Add the following to JSON_ParserState:

JSON_ObjectKeys *object_keys;

Since json_parse_any in recursive, we can take advantage of the callstack to keep track of the previous value of object_keys.

When we hit a [, we do something like:

        case '[': {
            state->cursor++;
            json_eat_whitespace(state);
            long stack_head = state->stack->head;

            JSON_ObjectKeys current;
            current.length = 0;

            JSON_ObjectKeys *previous_keys = state->object_keys;

            if (peek(state) == ']') {
                state->cursor++;
                return json_push_value(state, config, json_decode_array(state, config, 0));
            } else {
                state->current_nesting++;
                if (RB_UNLIKELY(config->max_nesting && (config->max_nesting < state->current_nesting))) {
                    rb_raise(eNestingError, "nesting of %d is too deep", state->current_nesting);
                }
                state->in_array++;

                // Only set the object_keys if we have at least one element.
                state->object_keys = &current;

                json_parse_any(state, config);
            }

            while (true) {
                json_eat_whitespace(state);

                const char next_char = peek(state);

                if (RB_LIKELY(next_char == ',')) {
                    state->cursor++;
                    if (config->allow_trailing_comma) {
                        json_eat_whitespace(state);
                        if (peek(state) == ']') {
                            continue;
                        }
                    }
                    json_parse_any(state, config);
                    continue;
                }

                if (next_char == ']') {
                    state->cursor++;
                    long count = state->stack->head - stack_head;
                    state->current_nesting--;
                    state->in_array--;

                     state->object_keys = previous_keys;

                    return json_push_value(state, config, json_decode_array(state, config, count));
                }

                raise_parse_error("expected ',' or ']' after array value", state);
            }
            break;
        }

We then probably need to keep track of the array index we're currently parsing.. if it's array_index > 1 we can check the object_keys when parsing an object.

The object parsing code needs to change too, if we're on array_index == 0, we add keys (in order) to the object_keys. For any additional object we can check the object_keys in order of reading them.

That's about as far as I've made it...

Closes: ruby/json#888 - Mark it as `inline`. - Use `RSTRING_GETMEM`, instead of `RSTRING_LEN` and `RSTRING_PTR`. - Use an inlinable version of `memcmp`. ``` == Parsing activitypub.json (58160 bytes) ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 11766.6 i/s after: 12272.1 i/s - 1.04x faster == Parsing twitter.json (567916 bytes) ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 1333.2 i/s after: 1422.0 i/s - 1.07x faster == Parsing citm_catalog.json (1727030 bytes) ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 656.3 i/s after: 673.1 i/s - 1.03x faster == Parsing float parsing (2251051 bytes) ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24] Comparison: before: 276.8 i/s after: 276.4 i/s - same-ish: difference falls within error ``` ruby/json@a67d1a1af4 Co-Authored-By: Scott Myron <[email protected]>

samyron commented Nov 3, 2025

View reviewed changes

ext/json/ext/parser/parser.c Outdated Show resolved Hide resolved

samyron force-pushed the sm/parser-misc-optimizations branch from c92bd6a to 5c7e741 Compare November 3, 2025 03:53

samyron commented Nov 3, 2025

View reviewed changes

Gemfile Outdated Show resolved Hide resolved

samyron commented Nov 3, 2025

View reviewed changes

ext/json/ext/parser/parser.c Show resolved Hide resolved

byroot force-pushed the sm/parser-misc-optimizations branch 2 times, most recently from 11037ad to 441d1ae Compare November 3, 2025 09:23

byroot mentioned this pull request Nov 3, 2025

parser.c: Skip checking for escape sequences in rstring_cache_fetch #890

Merged

byroot force-pushed the sm/parser-misc-optimizations branch 3 times, most recently from 04a6124 to a4e99bf Compare November 3, 2025 10:33

byroot reviewed Nov 3, 2025

View reviewed changes

samyron and others added 2 commits November 3, 2025 14:12

Various small optimizations and cleanups in the parser.

797aaa7

parser.c: Extract rstring_memcmp

cb6391e

byroot force-pushed the sm/parser-misc-optimizations branch from a4e99bf to cb6391e Compare November 3, 2025 13:12

byroot mentioned this pull request Nov 4, 2025

Micro-optimize rstring_cache_fetch #891

Merged

byroot closed this in #891 Nov 4, 2025

parser.c: various optimizations and simplifications #888

parser.c: various optimizations and simplifications #888

Conversation

samyron commented Nov 3, 2025

Benchmark

M1 Macbook Air:

Run 1

Run 2

Macbook Pro M4 Pro

Run 1

Run 2

Intel(R) Core(TM) i7-8850H laptop

Run 1

Run 2

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samyron Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

byroot commented Nov 3, 2025

Uh oh!

byroot commented Nov 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samyron Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

byroot commented Nov 4, 2025

Uh oh!

byroot commented Nov 4, 2025

Uh oh!

byroot commented Nov 4, 2025

Uh oh!

samyron commented Nov 4, 2025 • edited by byroot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samyron Nov 3, 2025 •

edited

Loading

samyron Nov 3, 2025 •

edited

Loading

samyron commented Nov 4, 2025 •

edited by byroot

Loading