-
Notifications
You must be signed in to change notification settings - Fork 74
feat: add transform project #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished review yet, just post some comments so far.
src/iceberg/transform.h
Outdated
| /// \param predicate The predicate to project. | ||
| /// \return A Result containing either a shared pointer to the projected predicate or an | ||
| /// Error if the projection fails. | ||
| Result<std::shared_ptr<UnboundPredicate>> Project( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return std::unique_ptr instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| ICEBERG_ASSIGN_OR_RAISE(auto transformed_lit, func->Transform(lit)); | ||
| transformed.push_back(std::move(transformed_lit)); | ||
| } | ||
| return Expressions::Predicate(predicate->op(), std::string(name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot use convenience functions in the Expressions because they may throw and they are only supposed to be used by users. We need to stick with Make functions to deal with any error.
| std::move(transformed)); | ||
| } | ||
|
|
||
| static Result<std::shared_ptr<UnboundPredicate>> TruncateArray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know Java impl uses the same name. However, I am still confused that it is generic enough to be used by transform other than truncate and it has nothing to do with array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rename it with TruncateByteArray, it's used for string and binary truncation. I also add a GenericTransform which is used as a fallback for any special cases.
| auto transformed, | ||
| func->Transform(literal.type()->type_id() == TypeId::kDate | ||
| ? Literal::Date(std::get<T>(literal.value()) - 1) | ||
| : Literal::Int(std::get<T>(literal.value()) - 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look safe since predicate may be of any type but TruncateInteger indicates that only integer variants are accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the temporal usage into another function TransformTemporal.
| default: | ||
| return GenericTransform(std::move(ref), predicate, func); | ||
| } | ||
| std::unreachable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there is a std::unreachable() but other branches not?
|
|
||
| // Fixes an inclusive projection to account for incorrectly transformed values. | ||
| // align with Java implementation: | ||
| // https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/transforms/ProjectionUtil.java#L275 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link might be invalid over time because it points to the main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by changing to a permanent link.
| return pred; | ||
| } | ||
| } | ||
| std::unreachable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be moved out of the switch block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already out of the switch block.
| } | ||
| } | ||
|
|
||
| std::unreachable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return nullptr for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't reach here, so the unreachable should be fine.
| const auto width = truncate_transform->width(); | ||
| ICEBERG_ASSIGN_OR_RAISE(auto ref, NamedReference::Make(std::string(name))); | ||
|
|
||
| if (str_value.length() < width) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? str_value.length() is the number of bytes not the number of UTF-8 chars but width is the number of UTF-8 chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, we should consider UTF-8, will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by adding a CodePointCount function to do the count.
Add Transform::Project for inclusive predicate projection
This PR implements Transform::Project, which transforms a BoundPredicate
to an inclusive predicate on partition values. StrictProject will be
added in a separate PR to keep the review easier.
Move template implementations of Expressions into header file,
or the linker will shout out the following:
Undefined symbols for architecture x86_64:
"std::__1::shared_ptr<iceberg::UnboundPredicateImpl<iceberg::BoundReference>> iceberg::Expressions::In<iceberg::BoundReference>(std::__1::shared_ptr<iceberg::UnboundTerm<iceberg::BoundReference>>, std::initializer_list<iceberg::Literal>)", referenced from:
iceberg::ProjectionUtil::FixInclusiveTimeProjection(std::__1::shared_ptr<iceberg::UnboundPredicateImpl<iceberg::BoundReference>> const&) in transform.cc.o
7a97b2e to
5b9ddae
Compare
src/iceberg/test/transform_test.cc
Outdated
| // The predicate's term should be a transform | ||
| EXPECT_EQ(bound_pred->term()->kind(), Term::Kind::kTransform); | ||
|
|
||
| auto dummy = Expressions::NotEqual<BoundTransform>(bucket_term, Literal::Int(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, sorry about the oversight :)
Add Transform::Project for inclusive predicate projection
This PR implements Transform::Project, which transforms a BoundPredicate to an inclusive predicate on partition values. StrictProject will be added in a separate PR to keep the review easier.
Move template implementations of Expressions into header file, or the linker will shout out the following:
Undefined symbols for architecture x86_64:
"std::__1::shared_ptr<iceberg::UnboundPredicateImpliceberg::BoundReference> iceberg::Expressions::Iniceberg::BoundReference(std::__1::shared_ptr<iceberg::UnboundTermiceberg::BoundReference>, std::initializer_listiceberg::Literal)", referenced from:
iceberg::ProjectionUtil::FixInclusiveTimeProjection(std::__1::shared_ptr<iceberg::UnboundPredicateImpliceberg::BoundReference> const&) in transform.cc.o