Closed
Conversation
4e2d22a to
3c7465e
Compare
Contributor
|
@dain do you think we can add a snapshot release workflow, release it and benchmark it in Trino? |
wendigo
reviewed
Mar 8, 2026
| return isAsciiRaw(utf8, offset, length); | ||
| } | ||
|
|
||
| private static boolean isAsciiRaw(byte[] utf8, int utf8Offset, int utf8Length) |
Contributor
There was a problem hiding this comment.
Why operate on a byte[] rather than Slice? This duplicates the LONG_HANDLEs that are already defined in Slice
Member
Author
There was a problem hiding this comment.
Two reasons:
- It makes it much easier to see where the bounds checks are.
- It is easier for the JVM to optimize when it's working with raw byte arrays.
That said, the JVM is also very good at unwinding stack. As another side reason, there are cases where you're sometimes operating on a raw byte array and having a wrapper and a slice is kind of annoying just to do some basic string manipulation.
Member
Author
|
@wendigo if you put up a PR for the workflow you're asking about, I'll check it in. |
Contributor
unwrapping slice makes it easier to see what is happening in these algorithms and makes it easier to optimize. Additionally this makes these functions usable without having to wrap them into a slice.
Benchmark (benchmarkCompareUtf16BE, length=1000): - ascii=true: 3.483 -> 0.102 ns/codepoint - ascii=false: 8.214 -> 6.395 ns/codepoint
Benchmark (benchmarkReverse, length=1000): - ascii=true: 0.318 -> 0.067 ns/codepoint - ascii=false: 3.397 -> 3.406 ns/codepoint (flat within noise)
Benchmark (benchmarkToUpperCase, length=1000): - ascii=true: 3.053 -> 0.601 ns/codepoint - ascii=false: 7.254 -> 5.019 ns/codepoint
Benchmark (benchmarkToLowerCase, length=1000): - ascii=true: 3.029 -> 0.501 ns/codepoint - ascii=false: 7.145 -> 4.183 ns/codepoint
Benchmark (benchmarkFixInvalidUtf8WithoutReplacement, inputLength=1024): - valid_non_ascii: 6.341 -> 3.978 ns/byte - invalid_non_ascii: 6.242 -> 4.549 ns/byte
Benchmark (benchmarkLeftTrim, length=1000): - ascii=true: 1.919 -> 0.344 ns/codepoint - ascii=false: 3.137 -> 2.201 ns/codepoint
Benchmark (benchmarkRightTrim, length=1000): - ascii=true: 0.551 -> 0.359 ns/codepoint - ascii=false: 2.939 -> 2.534 ns/codepoint
Benchmark (benchmarkTrimCustom, length=1000): - ascii=true: 2.702 -> 0.474 ns/codepoint - ascii=false: 5.224 -> 4.329 ns/codepoint
Benchmark (benchmarkSetCodePointAt, length=1000): - ascii=true: 0.336 -> 0.332 ns/codepoint - ascii=false: 2.259 -> 2.334 ns/codepoint Related benchmark (benchmarkCodePointToUtf8, length=1000): - ascii=false: 2.404 -> 2.154 ns/codepoint
Useful for Trino VARCHAR->code points casts and similar decode loops. Benchmark (ns/byte, length=1000): - toCodePointsApi ascii: 0.2319 (baseline two-pass: 2.4902) - toCodePointsApi non-ascii: 1.0820 (baseline two-pass: 1.6643)
Adds fromCodePoints to encode code-point arrays directly into UTF-8 Slice output. This is useful for Trino-style loops that currently pre-size and encode with repeated setCodePointAt calls. Benchmark (SliceUtf8Benchmark, length=1000 code points): - ascii=true: fromCodePointsApi 0.326 ns/codepoint vs Trino baseline 0.500 ns/codepoint - ascii=false: fromCodePointsApi 2.062 ns/codepoint vs Trino baseline 3.230 ns/codepoint
Adds codePointByteLengths so callers can decode UTF-8 once and directly materialize per-code-point byte widths (1..4) for padding/loop planning. Benchmark (SliceUtf8Benchmark, length=128 code points): - ascii=true: helper(byte[]) 0.696 ns/codepoint vs Trino byte[] baseline 1.020 ns/codepoint - ascii=false: helper(byte[]) 2.129 ns/codepoint vs Trino byte[] baseline 3.596 ns/codepoint
Contributor
|
@dain you'd need to push it as a branch to upstream in order to release a snapshot |
3c7465e to
605b373
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
byte[] + offset + length, withSliceoverloads delegating.High-level optimization approaches
byte[]first internals: better JVM bounds-check hoisting and easier raw-array integration.long/intlanes via var handles) to skip equal ASCII regions quickly.New APIs
toCodePoints(byte[] utf8, int offset, int length)fromCodePoints(int[] codePoints, int offset, int length)codePointByteLengths(byte[] utf8, int offset, int length)Benchmark highlights (JMH)
Most results below are for
length=1000code points unless noted.benchmarkCompareUtf16BE3.483 -> 0.102 ns/codepoint(~34x)8.214 -> 6.395 ns/codepoint(~1.28x)benchmarkToLowerCase3.029 -> 0.501 ns/codepoint(~6.0x)7.145 -> 4.183 ns/codepoint(~1.71x)benchmarkToUpperCase3.053 -> 0.601 ns/codepoint(~5.1x)7.254 -> 5.019 ns/codepoint(~1.45x)benchmarkTrimCustom2.702 -> 0.474 ns/codepoint(~5.7x)5.224 -> 4.329 ns/codepoint(~1.21x)benchmarkLeftTrim1.919 -> 0.344 ns/codepoint(~5.6x)3.137 -> 2.201 ns/codepoint(~1.42x)benchmarkRightTrim0.551 -> 0.359 ns/codepoint(~1.53x)2.939 -> 2.534 ns/codepoint(~1.16x)benchmarkToCodePointsApi(ns/byte)2.4902 -> 0.2319(~10.7x vs two-pass baseline)1.6643 -> 1.0820(~1.54x vs two-pass baseline)benchmarkFromCodePointsApi0.500 -> 0.326 ns/codepoint(~1.53x)3.230 -> 2.062 ns/codepoint(~1.57x)benchmarkFixInvalidUtf8WithoutReplacement(inputLength=1024, ns/byte)6.341 -> 3.978(~1.59x)6.242 -> 4.549(~1.37x)benchmarkReverse0.318 -> 0.067 ns/codepoint(~4.7x)3.397 -> 3.406 ns/codepoint(flat/noise)codePointByteLengthshelper benchmark (length=128)1.020 -> 0.696 ns/codepoint(~1.47x)3.596 -> 2.129 ns/codepoint(~1.69x)Small-string sanity (tail paths)
Ran a dedicated JMH sanity pass at non-8-multiple lengths
7and31(withascii=true,false) for:compareUtf16BE,toLowerCase,toUpperCase,trimCustom,toCodePointsApi, andfromCodePointsApi.compareUtf16BE:7.332 / 10.781 ns/op(len=7 / 31)41.308 / 193.126 ns/opfromCodePointsApi:7.702 / 14.937 ns/op25.337 / 64.678 ns/optoCodePointsApi:5.843 / 12.206 ns/op29.565 / 123.929 ns/optoLowerCase:12.790 / 29.095 ns/op27.257 / 124.587 ns/optoUpperCase:7.894 / 23.449 ns/op36.579 / 124.978 ns/optrimCustom:18.294 / 29.761 ns/op54.013 / 170.847 ns/opConclusion: no obvious small-string regressions; short-input behavior is consistent with expected fixed-overhead effects.