Skip to content

Java Extension Optimizations #835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

samyron
Copy link
Contributor

@samyron samyron commented Aug 13, 2025

Changelog 📓

  • Use a segmented buffer for the OutputStream to reduce System.arraycopy's each time the output buffer is resized.
  • Refactored StringEncoder#encode to include a SWAR-based fast path for basic JSON encoding. The algorithm is from this post. It's the same as the vector-based algorithm in the C extension.

These features can be toggled with the system properties json.useSegmentedOutputStream and json.useSWARBasicEncoder. Both default to true. I'm happy to remove these. They made testing and benchmarking much easier.

Benchmarks

SegmentedByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.741k i/100ms
Calculating -------------------------------------
                json     18.378k (± 6.3%) i/s   (54.41 μs/i) -    182.805k in  10.011722s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    85.000 i/100ms
Calculating -------------------------------------
                json    857.615 (± 1.3%) i/s    (1.17 ms/i) -      8.585k in  10.012075s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   185.000 i/100ms
Calculating -------------------------------------
                json      1.849k (± 1.0%) i/s  (540.77 μs/i) -     18.500k in  10.005181s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.558k i/100ms
Calculating -------------------------------------
                json     25.217k (± 1.1%) i/s   (39.66 μs/i) -    253.242k in  10.043890s

ByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.560k i/100ms
Calculating -------------------------------------
                json     15.622k (± 0.8%) i/s   (64.01 μs/i) -    157.560k in  10.086737s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    87.000 i/100ms
Calculating -------------------------------------
                json    875.692 (± 0.9%) i/s    (1.14 ms/i) -      8.787k in  10.035282s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   182.000 i/100ms
Calculating -------------------------------------
                json      1.818k (± 0.8%) i/s  (550.15 μs/i) -     18.200k in  10.013389s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.544k i/100ms
Calculating -------------------------------------
                json     25.319k (± 0.9%) i/s   (39.50 μs/i) -    254.400k in  10.048804s

ByteListDirectOutputStream + Scalar

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.078k i/100ms
Calculating -------------------------------------
                json     10.829k (± 2.5%) i/s   (92.35 μs/i) -    108.878k in  10.062513s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    78.000 i/100ms
Calculating -------------------------------------
                json    810.901 (± 2.8%) i/s    (1.23 ms/i) -      8.112k in  10.013134s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   128.000 i/100ms
Calculating -------------------------------------
                json      1.269k (± 3.3%) i/s  (788.26 μs/i) -     12.672k in  10.001657s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.178k i/100ms
Calculating -------------------------------------
                json     21.633k (± 1.0%) i/s   (46.23 μs/i) -    217.800k in  10.068853s

SegmentedByteListDirectOutputStream + Scalar

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.014k i/100ms
Calculating -------------------------------------
                json     10.203k (± 0.8%) i/s   (98.01 μs/i) -    102.414k in  10.037929s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    79.000 i/100ms
Calculating -------------------------------------
                json    814.479 (± 2.1%) i/s    (1.23 ms/i) -      8.216k in  10.092101s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   136.000 i/100ms
Calculating -------------------------------------
                json      1.358k (± 1.0%) i/s  (736.45 μs/i) -     13.600k in  10.016731s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.246k i/100ms
Calculating -------------------------------------
                json     21.987k (± 1.6%) i/s   (45.48 μs/i) -    220.108k in  10.013722s

master (as of commit 37e6890)

% ONLY=json ruby -I"lib" benchmark/encoder-realworld.rb 

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   951.000 i/100ms
Calculating -------------------------------------
                json      9.517k (± 0.8%) i/s  (105.08 μs/i) -     96.051k in  10.093716s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    84.000 i/100ms
Calculating -------------------------------------
                json    843.486 (± 1.1%) i/s    (1.19 ms/i) -      8.484k in  10.059526s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   145.000 i/100ms
Calculating -------------------------------------
                json      1.448k (± 0.8%) i/s  (690.73 μs/i) -     14.500k in  10.016276s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.342k i/100ms
Calculating -------------------------------------
                json     23.073k (± 0.8%) i/s   (43.34 μs/i) -    231.858k in  10.049473s

@byroot byroot requested a review from headius August 13, 2025 14:03
private static final int DEFAULT_CAPACITY = 1024;

private int totalLength;
private byte[][] segments = new byte[21][];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 21? The minimum segment size is 1024 for the first segment. The code doubles the segment size for each additional segment. Based on this doubling, we only need 21 segments before we hit Integer.MAX_VALUE.

@samyron
Copy link
Contributor Author

samyron commented Aug 14, 2025

Synthetic benchmarks of encoding an array of 128-byte ASCII strings.

benchmark_encoding "bytes.128.bestcase", ([("a" * 128)] * 10000)

SegmetedByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   256.000 i/100ms
Calculating -------------------------------------
                json      2.561k (± 0.9%) i/s  (390.48 μs/i) -     25.600k in   9.997219s

ByteListDirectOutputStream + Scalar (effectively the same code as master)

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   137.000 i/100ms
Calculating -------------------------------------
                json      1.376k (± 1.2%) i/s  (726.60 μs/i) -     13.837k in  10.055507s

SegmentedByteListDirectOutputStream + Scalar

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   141.000 i/100ms
Calculating -------------------------------------
                json      1.424k (± 0.8%) i/s  (702.28 μs/i) -     14.241k in  10.001896s

ByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   254.000 i/100ms
Calculating -------------------------------------
                json      2.558k (± 1.5%) i/s  (390.92 μs/i) -     25.654k in  10.030970s

Master

% ONLY=json ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   134.000 i/100ms
Calculating -------------------------------------
                json      1.334k (± 3.6%) i/s  (749.69 μs/i) -     13.400k in  10.062253s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants