Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Oct 28, 2025

  • Implemented Negate, ToString and Test functions for bound predicate subclasses.
  • Added hash support to Literal.
  • Refactored BoundSetPredicate to use unordered set for literals.
  • Refactored predicate unit test to be better organized.

@wgtmac wgtmac changed the title feat: add full bound predicate support feat: add more bound predicate support Oct 28, 2025
- Implemented Negate and ToString functions for bound predicates.
- Added hash support to Literal.
- Refactored BoundSetPredicate to use unordered set for literals.
- Refactored predicate unit test to be better organized.
@wgtmac wgtmac changed the title feat: add more bound predicate support feat: implement all functions of bound predicates Oct 28, 2025
Copy link
Collaborator

@zhjwpku zhjwpku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yingcai-cy
Copy link
Contributor

LGTM

BoundPredicate::~BoundPredicate() = default;

Result<Literal::Value> BoundPredicate::Evaluate(const StructLike& data) const {
Result<Literal> BoundPredicate::Evaluate(const StructLike& data) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we have a tool to capture the API signature changes in release notes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have that yet, which seems fine at this early stage of iceberg-cpp, but we should follow semantic versioning once the project becomes stable, i.e. bump up major version for incompatible API changes.

} else if constexpr (std::is_same_v<T, std::vector<uint8_t>>) {
std::size_t hash = 0;
for (size_t i = 0; i < v.size(); ++i) {
hash ^= std::hash<uint8_t>{}(v[i]) + kHashPrime + (hash << 6) + (hash >> 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing identical bytes at different positions can produce same hash (e.g., [1,2] and [2,1] might collide).
Can we add position i into the hash value to decrease the possibility of collide?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing identical bytes at different positions can produce same hash (e.g., [1,2] and [2,1] might collide).

I don't think this statement is correct, the left and right shifts should ensure a difference.
I ran a quick demo on Godbolt [1], and as you can see, the hashes of [1,2] and [2,1] are different.

[1] https://godbolt.org/z/4ss9E749q

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I traced through the logic again and realized you are right. The hash is position-sensitive because each iteration depends on the accumulated hash state from previous iterations.


Result<bool> BoundLiteralPredicate::Test(const Literal::Value& value) const {
return NotImplemented("BoundLiteralPredicate::Test not implemented");
Result<bool> BoundLiteralPredicate::Test(const Literal& value) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add null checking?
if (value.IsNull() || literal_.IsNull()) {
return false;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing nulls returns std::partial_ordering::unordered so it is automatically false: https://github.com/apache/iceberg-cpp/blob/main/src/iceberg/expression/literal.cc#L358

@wgtmac
Copy link
Member Author

wgtmac commented Nov 4, 2025

Let me merge this. Thanks @zhjwpku @HuaHuaY @yingcai-cy @Fokko @shangxinli for the review!

@wgtmac wgtmac merged commit dc23f76 into apache:main Nov 4, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants