-
Notifications
You must be signed in to change notification settings - Fork 357
feat(java): long array serializer support varint encoding #3115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@Pigsy-Monk We have |
|
Hi @chaokunyang , I have researched the CompressedLongArraySerializer algorithm. My understanding is that it assumes all elements in the array are compressible, and that the achievable compression ratio is limited by the largest element in the array. In my use case, the majority of the long values are small, with only a small fraction being large. |
|
@Pigsy-Monk Could you fix code styel errors? |
|
Sure I am trying to figure out where it went wrong. |
| this(fory, false); | ||
| } | ||
|
|
||
| public LongArraySerializer(Fory fory, boolean supportVarLenEncoding) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ForyConfig has compressIntArray and compressLongArray options, how about use that directly, and for LongArraySerializer, we compress based on LongEncoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds better. I will submit a commit.
| } | ||
|
|
||
| if(fory.getConfig().compressLongArray()){ | ||
| return readVarLongs(buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you switch on LongEncoding to invoke different functions to use different comrpession algorithms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I am working on it.
| int length = value.length; | ||
| buffer.writeVarUint32Small7(length); | ||
| for (int i = 0; i < length; i++) { | ||
| PrimitiveSerializers.LongSerializer.writeInt64(buffer, value[i], longEncoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have two functions to move the switch on LongEncoding outside the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private void writeInt64s(MemoryBuffer buffer, long[] value, LongEncoding longEncoding) {
int length = value.length;
buffer.writeVarUint32Small7(length);
if(longEncoding == LongEncoding.SLI){
for (int i = 0; i < length; i++) {
buffer.writeSliInt64(value[i]);
}
return;
}
for (int i = 0; i < length; i++) {
buffer.writeVarInt64(value[i]);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I get you right. Do you mean something like this?
chaokunyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, would you like to open another PR to add varint compression to int[] array?
|
Yes that's what i plan to do after merge this one. |
|
@Pigsy-Monk Please fix checkstyle error |
|
Sure, I am trying to figure out where it went wrong. |
|
It would been easier if the error message indicates the line number. |
|
@chaokunyang Could you please tell me which plugin you use to check the code style? |
You can just run ci/format.sh --java And see maven pom.xml for detailed plugin we used. We use spotless For code format |
|
Thanks. I run ci/format.sh --java. Hope it work this time. |
|
Hope it works this time. |
What does this PR do?
This PR adds variable-length encoding serializers for
long[]arrays in Java, which provides more space-efficient serialization for arrays containing many small values.Changes:
Enhance
LongArraySerializerwith variable-length encoding support: AddedsupportVarLenEncodingparameter toLongArraySerializerconstructor, allowing it to optionally use variable-length encoding when enabled.Add comprehensive test cases:
testVariableLengthLongArray(): Tests serialization/deserialization of long arrays with various value ranges (empty, small, mixed, large, negative values)testVariableLengthEncodingEfficiencyForSmallValues(): Demonstrates that variable-length encoding produces significantly smaller serialized data (50%+ reduction) for arrays containing many small valuesTest Details:
testVariableLengthLongArray:
Long.MAX_VALUEandLong.MIN_VALUE)testVariableLengthEncodingEfficiencyForSmallValues:
Performance Benefits:
For arrays containing many small values:
longelement + overheadlongelement + overheadRelated Issues:
Does this PR introduce any user-facing change?
No, this PR only adds new serializer classes and test cases. The default serializers remain unchanged. Users can opt-in to variable-length encoding by using the enhanced
LongArraySerializerwithsupportVarLenEncoding=true.