Skip to content

Conversation

@zhjwpku
Copy link
Collaborator

@zhjwpku zhjwpku commented Nov 29, 2025

Add Transform::Project for inclusive predicate projection

This PR implements Transform::Project, which transforms a BoundPredicate to an inclusive predicate on partition values. StrictProject will be added in a separate PR to keep the review easier.

Move template implementations of Expressions into header file, or the linker will shout out the following:

Undefined symbols for architecture x86_64:
"std::__1::shared_ptr<iceberg::UnboundPredicateImpliceberg::BoundReference> iceberg::Expressions::Iniceberg::BoundReference(std::__1::shared_ptr<iceberg::UnboundTermiceberg::BoundReference>, std::initializer_listiceberg::Literal)", referenced from:
iceberg::ProjectionUtil::FixInclusiveTimeProjection(std::__1::shared_ptr<iceberg::UnboundPredicateImpliceberg::BoundReference> const&) in transform.cc.o

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished review yet, just post some comments so far.

/// \param predicate The predicate to project.
/// \return A Result containing either a shared pointer to the projected predicate or an
/// Error if the projection fails.
Result<std::shared_ptr<UnboundPredicate>> Project(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return std::unique_ptr instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ICEBERG_ASSIGN_OR_RAISE(auto transformed_lit, func->Transform(lit));
transformed.push_back(std::move(transformed_lit));
}
return Expressions::Predicate(predicate->op(), std::string(name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use convenience functions in the Expressions because they may throw and they are only supposed to be used by users. We need to stick with Make functions to deal with any error.

std::move(transformed));
}

static Result<std::shared_ptr<UnboundPredicate>> TruncateArray(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know Java impl uses the same name. However, I am still confused that it is generic enough to be used by transform other than truncate and it has nothing to do with array.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rename it with TruncateByteArray, it's used for string and binary truncation. I also add a GenericTransform which is used as a fallback for any special cases.

auto transformed,
func->Transform(literal.type()->type_id() == TypeId::kDate
? Literal::Date(std::get<T>(literal.value()) - 1)
: Literal::Int(std::get<T>(literal.value()) - 1)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look safe since predicate may be of any type but TruncateInteger indicates that only integer variants are accepted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split the temporal usage into another function TransformTemporal.

default:
return GenericTransform(std::move(ref), predicate, func);
}
std::unreachable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is a std::unreachable() but other branches not?


// Fixes an inclusive projection to account for incorrectly transformed values.
// align with Java implementation:
// https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/transforms/ProjectionUtil.java#L275
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link might be invalid over time because it points to the main branch.

Copy link
Collaborator Author

@zhjwpku zhjwpku Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by changing to a permanent link.

return pred;
}
}
std::unreachable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be moved out of the switch block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already out of the switch block.

}
}

std::unreachable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return nullptr for consistency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't reach here, so the unreachable should be fine.

const auto width = truncate_transform->width();
ICEBERG_ASSIGN_OR_RAISE(auto ref, NamedReference::Make(std::string(name)));

if (str_value.length() < width) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? str_value.length() is the number of bytes not the number of UTF-8 chars but width is the number of UTF-8 chars.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, we should consider UTF-8, will fix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by adding a CodePointCount function to do the count.

Add Transform::Project for inclusive predicate projection

This PR implements Transform::Project, which transforms a BoundPredicate
to an inclusive predicate on partition values. StrictProject will be
added in a separate PR to keep the review easier.

Move template implementations of Expressions into header file,
or the linker will shout out the following:

Undefined symbols for architecture x86_64:
  "std::__1::shared_ptr<iceberg::UnboundPredicateImpl<iceberg::BoundReference>> iceberg::Expressions::In<iceberg::BoundReference>(std::__1::shared_ptr<iceberg::UnboundTerm<iceberg::BoundReference>>, std::initializer_list<iceberg::Literal>)", referenced from:
      iceberg::ProjectionUtil::FixInclusiveTimeProjection(std::__1::shared_ptr<iceberg::UnboundPredicateImpl<iceberg::BoundReference>> const&) in transform.cc.o
// The predicate's term should be a transform
EXPECT_EQ(bound_pred->term()->kind(), Term::Kind::kTransform);

auto dummy = Expressions::NotEqual<BoundTransform>(bucket_term, Literal::Int(5));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, sorry about the oversight :)

@wgtmac wgtmac merged commit 61b83ea into apache:main Dec 3, 2025
10 checks passed
@zhjwpku zhjwpku deleted the transform_project branch December 3, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants