-
Notifications
You must be signed in to change notification settings - Fork 20
119-Introduce-CompactedKeyEncoder #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
luoyuxia
merged 37 commits into
apache:main
from
leekeiabstraction:119-Introduce-CompactedKeyEncoder
Jan 9, 2026
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
0350ba1
Introduce CompactedKeyEncoder, initial commit
leekeiabstraction a02719a
Introduce CompactedKeyEncoder: working test cases
leekeiabstraction bf9215d
Use Datum, remove Value
leekeiabstraction 111b08f
All data type unit test
leekeiabstraction 082257d
Use Result in KeyEncoder
leekeiabstraction f506ab0
Update todo comment
leekeiabstraction b6e308e
Mark test methods as cfg(test)
leekeiabstraction a71db15
ValueWriter documentation
leekeiabstraction 006d593
Minor refactoring
leekeiabstraction 7dcea05
Add null check
leekeiabstraction a6c26a6
Improve todo message
leekeiabstraction fe7da7b
Move licence to top
leekeiabstraction 93ca8ab
More readable test case
leekeiabstraction c668dc5
Improve error message, use write_bytes for BytesWriter
leekeiabstraction ad3bd4a
Fix documentation
leekeiabstraction ac1d4b3
Use Result<> to return CompactedKeyEncoder and ValueWriter for better…
leekeiabstraction f29e2cc
More idiomatic implementation of encode_key
leekeiabstraction cbd0e0f
Improve documentation
leekeiabstraction 352f157
Improve error message
leekeiabstraction 16f4514
Improve error message
leekeiabstraction 07d6105
Minor refactor
leekeiabstraction 145830a
Formatting and clippy
leekeiabstraction e1d85a6
Formatting and clippy
leekeiabstraction 1ec7693
Addressed PR comments
leekeiabstraction 700bb47
Addressed PR comments
leekeiabstraction ca4036c
Improve error message
leekeiabstraction 4df3a81
Clippy
leekeiabstraction bcb7e08
Improve error message
leekeiabstraction b496612
Improve error message
leekeiabstraction 043e1d4
Improve and remove duplicate todos
leekeiabstraction 0ac34be
Fix test
leekeiabstraction 88ec810
More succinct code
leekeiabstraction d5a916b
More succinct code
leekeiabstraction eb70020
Use static dispatch for more performant code
leekeiabstraction eff7a88
Use static dispatch for better performance
leekeiabstraction b02233a
Move for_test_row_type function into test module
leekeiabstraction 3f5b938
More idiomatic row type building
leekeiabstraction File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,210 @@ | ||
| // Licensed to the Apache Software Foundation (ASF) under one | ||
| // or more contributor license agreements. See the NOTICE file | ||
| // distributed with this work for additional information | ||
| // regarding copyright ownership. The ASF licenses this file | ||
| // to you under the Apache License, Version 2.0 (the | ||
| // "License"); you may not use this file except in compliance | ||
| // with the License. You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, | ||
| // software distributed under the License is distributed on an | ||
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| // KIND, either express or implied. See the License for the | ||
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| use crate::error::Error::IllegalArgument; | ||
| use crate::error::Result; | ||
| use crate::metadata::DataType; | ||
| use crate::row::Datum; | ||
| use crate::row::binary::BinaryRowFormat; | ||
|
|
||
| /// Writer to write a composite data format, like row, array, | ||
| #[allow(dead_code)] | ||
| pub trait BinaryWriter { | ||
| /// Reset writer to prepare next write | ||
| fn reset(&mut self); | ||
|
|
||
| /// Set null to this field | ||
| fn set_null_at(&mut self, pos: usize); | ||
|
|
||
| fn write_boolean(&mut self, value: bool); | ||
|
|
||
| fn write_byte(&mut self, value: u8); | ||
|
|
||
| fn write_bytes(&mut self, value: &[u8]); | ||
|
|
||
| fn write_char(&mut self, value: &str, length: usize); | ||
|
|
||
| fn write_string(&mut self, value: &str); | ||
|
|
||
| fn write_short(&mut self, value: i16); | ||
|
|
||
| fn write_int(&mut self, value: i32); | ||
|
|
||
| fn write_long(&mut self, value: i64); | ||
|
|
||
| fn write_float(&mut self, value: f32); | ||
|
|
||
| fn write_double(&mut self, value: f64); | ||
|
|
||
| fn write_binary(&mut self, bytes: &[u8], length: usize); | ||
|
|
||
| // TODO Decimal type | ||
| // fn write_decimal(&mut self, pos: i32, value: f64); | ||
|
|
||
| // TODO Timestamp type | ||
| // fn write_timestamp_ntz(&mut self, pos: i32, value: i64); | ||
|
|
||
| // TODO Timestamp type | ||
| // fn write_timestamp_ltz(&mut self, pos: i32, value: i64); | ||
|
|
||
| // TODO InternalArray, ArraySerializer | ||
| // fn write_array(&mut self, pos: i32, value: i64); | ||
|
|
||
| // TODO Row serializer | ||
| // fn write_row(&mut self, pos: i32, value: &InternalRow); | ||
|
|
||
| /// Finally, complete write to set real size to binary. | ||
| fn complete(&mut self); | ||
| } | ||
|
|
||
| pub enum ValueWriter { | ||
| Nullable(InnerValueWriter), | ||
| NonNullable(InnerValueWriter), | ||
| } | ||
|
|
||
| impl ValueWriter { | ||
| pub fn create_value_writer( | ||
| element_type: &DataType, | ||
| binary_row_format: Option<&BinaryRowFormat>, | ||
| ) -> Result<ValueWriter> { | ||
| let value_writer = | ||
| InnerValueWriter::create_inner_value_writer(element_type, binary_row_format)?; | ||
| if element_type.is_nullable() { | ||
| Ok(Self::Nullable(value_writer)) | ||
| } else { | ||
| Ok(Self::NonNullable(value_writer)) | ||
| } | ||
| } | ||
|
|
||
| pub fn write_value<W: BinaryWriter>( | ||
| &self, | ||
| writer: &mut W, | ||
| pos: usize, | ||
| value: &Datum, | ||
| ) -> Result<()> { | ||
| match self { | ||
| Self::Nullable(inner_value_writer) => { | ||
| if let Datum::Null = value { | ||
| writer.set_null_at(pos); | ||
| Ok(()) | ||
| } else { | ||
| inner_value_writer.write_value(writer, pos, value) | ||
| } | ||
| } | ||
| Self::NonNullable(inner_value_writer) => { | ||
| inner_value_writer.write_value(writer, pos, value) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| #[derive(Debug)] | ||
| pub enum InnerValueWriter { | ||
| Char, | ||
| String, | ||
| Boolean, | ||
| Binary, | ||
| Bytes, | ||
| TinyInt, | ||
| SmallInt, | ||
| Int, | ||
| BigInt, | ||
| Float, | ||
| Double, | ||
| // TODO Decimal, Date, TimeWithoutTimeZone, TimestampWithoutTimeZone, TimestampWithLocalTimeZone, Array, Row | ||
| } | ||
|
|
||
| /// Accessor for writing the fields/elements of a binary writer during runtime, the | ||
| /// fields/elements must be written in the order. | ||
| impl InnerValueWriter { | ||
| pub fn create_inner_value_writer( | ||
| data_type: &DataType, | ||
| _: Option<&BinaryRowFormat>, | ||
| ) -> Result<InnerValueWriter> { | ||
| match data_type { | ||
| DataType::Char(_) => Ok(InnerValueWriter::Char), | ||
| DataType::String(_) => Ok(InnerValueWriter::String), | ||
| DataType::Boolean(_) => Ok(InnerValueWriter::Boolean), | ||
| DataType::Binary(_) => Ok(InnerValueWriter::Binary), | ||
| DataType::Bytes(_) => Ok(InnerValueWriter::Bytes), | ||
| DataType::TinyInt(_) => Ok(InnerValueWriter::TinyInt), | ||
| DataType::SmallInt(_) => Ok(InnerValueWriter::SmallInt), | ||
| DataType::Int(_) => Ok(InnerValueWriter::Int), | ||
| DataType::BigInt(_) => Ok(InnerValueWriter::BigInt), | ||
| DataType::Float(_) => Ok(InnerValueWriter::Float), | ||
| DataType::Double(_) => Ok(InnerValueWriter::Double), | ||
| _ => unimplemented!( | ||
| "ValueWriter for DataType {:?} is currently not implemented", | ||
| data_type | ||
| ), | ||
| } | ||
| } | ||
| pub fn write_value<W: BinaryWriter>( | ||
| &self, | ||
| writer: &mut W, | ||
| _pos: usize, | ||
| value: &Datum, | ||
| ) -> Result<()> { | ||
| match (self, value) { | ||
| (InnerValueWriter::Char, Datum::String(v)) => { | ||
| writer.write_char(v, v.len()); | ||
| } | ||
| (InnerValueWriter::String, Datum::String(v)) => { | ||
| writer.write_string(v); | ||
| } | ||
| (InnerValueWriter::Boolean, Datum::Bool(v)) => { | ||
| writer.write_boolean(*v); | ||
| } | ||
| (InnerValueWriter::Binary, Datum::Blob(v)) => { | ||
| writer.write_binary(v.as_ref(), v.len()); | ||
| } | ||
| (InnerValueWriter::Binary, Datum::BorrowedBlob(v)) => { | ||
| writer.write_binary(v.as_ref(), v.len()); | ||
| } | ||
| (InnerValueWriter::Bytes, Datum::Blob(v)) => { | ||
| writer.write_bytes(v.as_ref()); | ||
| } | ||
| (InnerValueWriter::Bytes, Datum::BorrowedBlob(v)) => { | ||
| writer.write_bytes(v.as_ref()); | ||
| } | ||
| (InnerValueWriter::TinyInt, Datum::Int8(v)) => { | ||
| writer.write_byte(*v as u8); | ||
| } | ||
| (InnerValueWriter::SmallInt, Datum::Int16(v)) => { | ||
| writer.write_short(*v); | ||
| } | ||
| (InnerValueWriter::Int, Datum::Int32(v)) => { | ||
| writer.write_int(*v); | ||
| } | ||
| (InnerValueWriter::BigInt, Datum::Int64(v)) => { | ||
| writer.write_long(*v); | ||
| } | ||
| (InnerValueWriter::Float, Datum::Float32(v)) => { | ||
| writer.write_float(v.into_inner()); | ||
| } | ||
| (InnerValueWriter::Double, Datum::Float64(v)) => { | ||
| writer.write_double(v.into_inner()); | ||
| } | ||
| _ => { | ||
| return Err(IllegalArgument { | ||
| message: format!("{:?} used to write value {:?}", self, value), | ||
| }); | ||
| } | ||
| } | ||
| Ok(()) | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| // Licensed to the Apache Software Foundation (ASF) under one | ||
| // or more contributor license agreements. See the NOTICE file | ||
| // distributed with this work for additional information | ||
| // regarding copyright ownership. The ASF licenses this file | ||
| // to you under the Apache License, Version 2.0 (the | ||
| // "License"); you may not use this file except in compliance | ||
| // with the License. You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, | ||
| // software distributed under the License is distributed on an | ||
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| // KIND, either express or implied. See the License for the | ||
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| mod binary_writer; | ||
|
|
||
| pub use binary_writer::*; | ||
|
|
||
| /// The binary row format types, it indicates the generated [`BinaryRow`] type by the [`BinaryWriter`] | ||
| #[allow(dead_code)] | ||
| pub enum BinaryRowFormat { | ||
| Compacted, | ||
| Aligned, | ||
| Indexed, | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| // Licensed to the Apache Software Foundation (ASF) under one | ||
| // or more contributor license agreements. See the NOTICE file | ||
| // distributed with this work for additional information | ||
| // regarding copyright ownership. The ASF licenses this file | ||
| // to you under the Apache License, Version 2.0 (the | ||
| // "License"); you may not use this file except in compliance | ||
| // with the License. You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, | ||
| // software distributed under the License is distributed on an | ||
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| // KIND, either express or implied. See the License for the | ||
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| use crate::row::compacted::compacted_row_writer::CompactedRowWriter; | ||
| use bytes::Bytes; | ||
|
|
||
| use crate::error::Result; | ||
| use crate::metadata::DataType; | ||
| use crate::row::binary::{BinaryRowFormat, BinaryWriter, ValueWriter}; | ||
| use delegate::delegate; | ||
|
|
||
| /// A wrapping of [`CompactedRowWriter`] used to encode key columns. | ||
| /// The encoding is the same as [`CompactedRowWriter`], but is without header of null bits to | ||
| /// represent whether the field value is null or not since the key columns must be not null. | ||
| pub struct CompactedKeyWriter { | ||
| delegate: CompactedRowWriter, | ||
| } | ||
|
|
||
| impl CompactedKeyWriter { | ||
| pub fn new() -> CompactedKeyWriter { | ||
| CompactedKeyWriter { | ||
| // in compacted key encoder, we don't need to set null bits as the key columns must be not | ||
| // null, to use field count 0 to init to make the null bits 0 | ||
| delegate: CompactedRowWriter::new(0), | ||
| } | ||
| } | ||
|
|
||
| pub fn create_value_writer(field_type: &DataType) -> Result<ValueWriter> { | ||
| ValueWriter::create_value_writer(field_type, Some(&BinaryRowFormat::Compacted)) | ||
| } | ||
|
|
||
| delegate! { | ||
| to self.delegate { | ||
| pub fn reset(&mut self); | ||
|
|
||
| #[allow(dead_code)] | ||
| pub fn position(&self) -> usize; | ||
|
|
||
| #[allow(dead_code)] | ||
| pub fn buffer(&self) -> &[u8]; | ||
|
|
||
| pub fn to_bytes(&self) -> Bytes; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl BinaryWriter for CompactedKeyWriter { | ||
| delegate! { | ||
| to self.delegate { | ||
| fn reset(&mut self); | ||
|
|
||
| fn set_null_at(&mut self, pos: usize); | ||
|
|
||
| fn write_boolean(&mut self, value: bool); | ||
|
|
||
| fn write_byte(&mut self, value: u8); | ||
|
|
||
| fn write_binary(&mut self, bytes: &[u8], length: usize); | ||
|
|
||
| fn write_bytes(&mut self, value: &[u8]); | ||
|
|
||
| fn write_char(&mut self, value: &str, _length: usize); | ||
|
|
||
| fn write_string(&mut self, value: &str); | ||
|
|
||
| fn write_short(&mut self, value: i16); | ||
|
|
||
| fn write_int(&mut self, value: i32); | ||
|
|
||
| fn write_long(&mut self, value: i64); | ||
|
|
||
| fn write_float(&mut self, value: f32); | ||
|
|
||
| fn write_double(&mut self, value: f64); | ||
|
|
||
|
|
||
| } | ||
| } | ||
|
|
||
| fn complete(&mut self) { | ||
| // do nothing | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.