Skip to content

Conversation

@samyron
Copy link
Contributor

@samyron samyron commented Oct 31, 2025

This might be a stretch but this PR implements a heuristic in json_eat_whitespace . If the next character is a \n, it may be followed by consecutive spaces (0x20). If so, we can skip them pretty quickly.

activitypub-pretty.json was generated by JSON.pretty_generate(JSON.load_file('activitypub.json')).

Compared to master on my M1 Macbook Air.

== Parsing activitypub.json (58160 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   935.000 i/100ms
Calculating -------------------------------------
               after      9.600k (± 0.5%) i/s  (104.17 μs/i) -     48.620k in   5.064774s

Comparison:
              before:     9452.1 i/s
               after:     9599.9 i/s - same-ish: difference falls within error


== Parsing activitypub-pretty.json (65761 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.177k i/100ms
Calculating -------------------------------------
               after     11.724k (± 1.6%) i/s   (85.30 μs/i) -     58.850k in   5.021088s

Comparison:
              before:    11074.0 i/s
               after:    11723.6 i/s - 1.06x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    99.000 i/100ms
Calculating -------------------------------------
               after    994.981 (± 0.6%) i/s    (1.01 ms/i) -      5.049k in   5.074626s

Comparison:
              before:      909.8 i/s
               after:      995.0 i/s - 1.09x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    55.000 i/100ms
Calculating -------------------------------------
               after    549.774 (± 0.7%) i/s    (1.82 ms/i) -      2.750k in   5.002283s

Comparison:
              before:      435.4 i/s
               after:      549.8 i/s - 1.26x  faster

Looking for 8 spaces was slightly faster than looking for 4. I'm not sure this is safe but it might be slightly faster:

if (chunk != 0x2020202020202020) {
    if (((uint32_t) chunk) == 0x20202020) {
        state->cursor += 4;
    }
    break;
}

@byroot byroot force-pushed the sm/parser-whitespace-optimizations branch from b18742e to f12e571 Compare November 1, 2025 10:26
while (state->cursor+sizeof(uint64_t) <= state->end) {
uint64_t chunk;
memcpy(&chunk, state->cursor, sizeof(uint64_t));
if (chunk != 0x2020202020202020) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do even better.

Unless I'm mistaken, we can get the exact number of consecutive spaces with:

__builtin_ctzll(bytes ^ 0x2020202020202020) / 8

byroot added a commit to byroot/json that referenced this pull request Nov 1, 2025
Closes: ruby#881

If we encounter a newline, it is likely that the document is pretty printed,
hence that the newline is followed by multiple spaces.

In such case we can use SWAR to count up to eight consecutive spaces at once.

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.118k i/100ms
Calculating -------------------------------------
               after     11.223k (± 0.7%) i/s   (89.10 μs/i) -     57.018k in   5.080522s

Comparison:
              before:    10834.4 i/s
               after:    11223.4 i/s - 1.04x  faster

== Parsing twitter.json (567916 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   118.000 i/100ms
Calculating -------------------------------------
               after      1.188k (± 1.0%) i/s  (841.62 μs/i) -      6.018k in   5.065355s

Comparison:
              before:     1094.8 i/s
               after:     1188.2 i/s - 1.09x  faster

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    58.000 i/100ms
Calculating -------------------------------------
               after    570.506 (± 3.7%) i/s    (1.75 ms/i) -      2.900k in   5.091529s

Comparison:
              before:      419.6 i/s
               after:      570.5 i/s - 1.36x  faster

== Parsing float parsing (2251051 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    22.000 i/100ms
Calculating -------------------------------------
               after    212.010 (± 1.9%) i/s    (4.72 ms/i) -      1.078k in   5.086885s

Comparison:
              before:      189.4 i/s
               after:      212.0 i/s - 1.12x  faster
```

Co-Authored-By: Scott Myron <[email protected]>
@byroot
Copy link
Member

byroot commented Nov 1, 2025

Improved version: #886

byroot added a commit to byroot/json that referenced this pull request Nov 1, 2025
Closes: ruby#881

If we encounter a newline, it is likely that the document is pretty printed,
hence that the newline is followed by multiple spaces.

In such case we can use SWAR to count up to eight consecutive spaces at once.

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.118k i/100ms
Calculating -------------------------------------
               after     11.223k (± 0.7%) i/s   (89.10 μs/i) -     57.018k in   5.080522s

Comparison:
              before:    10834.4 i/s
               after:    11223.4 i/s - 1.04x  faster

== Parsing twitter.json (567916 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   118.000 i/100ms
Calculating -------------------------------------
               after      1.188k (± 1.0%) i/s  (841.62 μs/i) -      6.018k in   5.065355s

Comparison:
              before:     1094.8 i/s
               after:     1188.2 i/s - 1.09x  faster

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    58.000 i/100ms
Calculating -------------------------------------
               after    570.506 (± 3.7%) i/s    (1.75 ms/i) -      2.900k in   5.091529s

Comparison:
              before:      419.6 i/s
               after:      570.5 i/s - 1.36x  faster

== Parsing float parsing (2251051 bytes)
ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    22.000 i/100ms
Calculating -------------------------------------
               after    212.010 (± 1.9%) i/s    (4.72 ms/i) -      1.078k in   5.086885s

Comparison:
              before:      189.4 i/s
               after:      212.0 i/s - 1.12x  faster
```

Co-Authored-By: Scott Myron <[email protected]>
@byroot byroot closed this in #886 Nov 1, 2025
matzbot pushed a commit to ruby/ruby that referenced this pull request Nov 1, 2025
Closes: ruby/json#881

If we encounter a newline, it is likely that the document is pretty printed,
hence that the newline is followed by multiple spaces.

In such case we can use SWAR to count up to eight consecutive spaces at once.

```
== Parsing activitypub.json (58160 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.118k i/100ms
Calculating -------------------------------------
               after     11.223k (± 0.7%) i/s   (89.10 μs/i) -     57.018k in   5.080522s

Comparison:
              before:    10834.4 i/s
               after:    11223.4 i/s - 1.04x  faster

== Parsing twitter.json (567916 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   118.000 i/100ms
Calculating -------------------------------------
               after      1.188k (± 1.0%) i/s  (841.62 μs/i) -      6.018k in   5.065355s

Comparison:
              before:     1094.8 i/s
               after:     1188.2 i/s - 1.09x  faster

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    58.000 i/100ms
Calculating -------------------------------------
               after    570.506 (± 3.7%) i/s    (1.75 ms/i) -      2.900k in   5.091529s

Comparison:
              before:      419.6 i/s
               after:      570.5 i/s - 1.36x  faster

== Parsing float parsing (2251051 bytes)
ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    22.000 i/100ms
Calculating -------------------------------------
               after    212.010 (± 1.9%) i/s    (4.72 ms/i) -      1.078k in   5.086885s

Comparison:
              before:      189.4 i/s
               after:      212.0 i/s - 1.12x  faster
```

ruby/json@b3fd7b26be

Co-Authored-By: Scott Myron <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants