-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Use direct byte[] utf-8 conversions #136053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use direct byte[] utf-8 conversions #136053
Conversation
Currently Elasticsearch is using StandardCharsets#decode and encode methods when working with optimized text. These variants are not as performant as the direct implementations in String when working with byte[]. If we are going to one-shot convert without validation then the String variants should be preferred.
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
I noticed this a few months ago when I was benchmarking something and it seemed to be we should switch variants. I can't think of a scenario where we would prefer the byte buffer encoder variant for String -> bytes. I could see the scenario where we would prefer the byte buffer variant for byte -> String if we wanted to catch UTF-8 errors. However, the default malformed input variant uses replace which is the same as the String ctor variant. public final CharBuffer decode(ByteBuffer bb) {
try {
return ThreadLocalCoders.decoderFor(this)
.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE)
.decode(bb);
} catch (CharacterCodingException x) {
throw new Error(x); // Can't happen
}
} I would note that this is probably different behavior than Jackson would give prior to this change. Are we okay with silently replacing invalid UTF-8 input? @jordan-powers - added the code |
|
Currently Elasticsearch is using StandardCharsets#decode and encode
methods when working with optimized text. These variants are not as
performant as the direct implementations in String when working with
byte[]. If we are going to one-shot convert without validation then the
String variants should be preferred.