Skip to content

feat: comparison ops for list and struct types#6104

Merged
universalmind303 merged 2 commits intoEventual-Inc:mainfrom
aaron-ang:cmp-list-struct
Feb 3, 2026
Merged

feat: comparison ops for list and struct types#6104
universalmind303 merged 2 commits intoEventual-Inc:mainfrom
aaron-ang:cmp-list-struct

Conversation

@aaron-ang
Copy link
Contributor

Changes Made

  • Moved Operator enum to daft-core to reuse comparison ops.
  • Try row-based comparison using arrow-row's RowConverter if specific conditions are met, falling back to element-wise comparison.

Related Issues

Closes #3510.

@github-actions github-actions bot added the feat label Jan 30, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 30, 2026

Greptile Overview

Greptile Summary

This PR implements comparison operations (==, !=, <, <=, >, >=, <=>) for List, FixedSizeList, and Struct data types in Daft. The implementation includes a performance optimization using Arrow's row-based comparison when applicable, with automatic fallback to element-wise comparison.

Key Changes:

  • Moved Operator enum from daft-dsl to daft-core for reuse in array operations
  • Improved Operator categorization with new helper methods (is_equality, is_ordering, is_logical)
  • Implemented row-based comparison optimization using arrow-row::RowConverter for supported types without nulls or floats
  • Added element-wise comparison logic for lists (lexicographic ordering) and structs (field-by-field comparison)
  • List comparison uses lexicographic ordering with length as tiebreaker
  • Struct comparison compares fields in schema order
  • Comprehensive test coverage for new comparison operations including null-safe comparisons

Minor Issue:

  • The is_comparison() method definition changed from the old implementation - it previously included logical operators (And, Or, Xor) but now correctly excludes them. This is more accurate but worth verifying no code relied on the old behavior.

Confidence Score: 4/5

  • This PR is generally safe to merge with one minor behavioral change to verify
  • The implementation is well-structured with comprehensive test coverage, proper optimization strategy, and correct null handling. The refactoring of Operator to daft-core is clean and all imports are properly updated. The one concern is the subtle behavior change in is_comparison() which previously included logical operators but now excludes them (which is more accurate). Score is 4/5 due to this behavioral change that should be verified.
  • Check src/daft-core/src/operator.rs for the is_comparison() behavior change impact

Important Files Changed

Filename Overview
src/daft-core/src/operator.rs New module with Operator enum moved from daft-dsl, with improved categorization methods (is_equality, is_ordering, is_logical)
src/daft-core/src/array/ops/comparison.rs Implements comparison operations for List, FixedSizeList, and Struct types with row-based optimization fallback and comprehensive test coverage
src/daft-core/src/series/ops/comparison.rs Adds match arms for List, FixedSizeList, and Struct types to enable comparison operations at series level
src/daft-dsl/src/expr/mod.rs Removes Operator enum definition (moved to daft-core)

Sequence Diagram

sequenceDiagram
    participant User
    participant Series
    participant ListArray
    participant RowFastPath
    participant ElementWise
    participant RowConverter

    User->>Series: compare(lhs, rhs, op)
    Series->>Series: Check if List/FixedSizeList/Struct type
    alt List/FixedSizeList/Struct
        Series->>ListArray: compare_list_like_array(lhs, rhs, op)
        ListArray->>ListArray: comparison_broadcast_info()
        ListArray->>RowFastPath: row_fast_path()?
        
        alt No nulls + No floats + Supported type
            RowFastPath->>RowFastPath: type_contains_float() → false
            RowFastPath->>RowFastPath: array_has_any_nulls() → false
            RowFastPath->>RowConverter: convert_columns()
            RowConverter-->>RowFastPath: row representations
            RowFastPath->>RowFastPath: Compare rows using cmp()
            RowFastPath-->>ListArray: Some(BooleanArray)
            ListArray-->>Series: BooleanArray
        else Contains nulls/floats or unsupported
            RowFastPath-->>ListArray: None (fallback)
            ListArray->>ElementWise: compare_list_like_value() per element
            loop For each array index
                ElementWise->>ElementWise: Get element at index
                ElementWise->>Series: compare_scalar_series()
                Series-->>ElementWise: Option<bool>
            end
            ElementWise-->>ListArray: BooleanArray
            ListArray-->>Series: BooleanArray
        end
    else Primitive types
        Series->>Series: Use existing comparison logic
    end
    Series-->>User: BooleanArray result
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +48 to +59
pub fn is_comparison(&self) -> bool {
matches!(
self,
Self::Eq
| Self::EqNullSafe
| Self::NotEq
| Self::Lt
| Self::LtEq
| Self::Gt
| Self::GtEq
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that is_comparison() behavior change doesn't break existing code. Old version included logical operators (And/Or/Xor), new version correctly excludes them

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daft-core/src/operator.rs
Line: 48:59

Comment:
Check that `is_comparison()` behavior change doesn't break existing code. Old version included logical operators (`And`/`Or`/`Xor`), new version correctly excludes them

How can I resolve this? If you propose a fix, please make it concise.

@codecov
Copy link

codecov bot commented Jan 31, 2026

Codecov Report

❌ Patch coverage is 67.60563% with 138 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.26%. Comparing base (e7adf06) to head (005c55c).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-core/src/array/ops/comparison.rs 77.56% 83 Missing ⚠️
src/daft-core/src/operator.rs 0.00% 38 Missing ⚠️
src/daft-core/src/series/ops/comparison.rs 0.00% 15 Missing ⚠️
src/daft-logical-plan/src/ops/project.rs 0.00% 1 Missing ⚠️
src/daft-recordbatch/src/lib.rs 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6104      +/-   ##
==========================================
- Coverage   43.34%   43.26%   -0.09%     
==========================================
  Files         917      921       +4     
  Lines      112995   114222    +1227     
==========================================
+ Hits        48978    49417     +439     
- Misses      64017    64805     +788     
Files with missing lines Coverage Δ
src/common/scan-info/src/expr_rewriter.rs 22.56% <ø> (ø)
src/daft-algebra/src/boolean.rs 100.00% <ø> (ø)
src/daft-algebra/src/simplify/boolean.rs 96.66% <ø> (ø)
src/daft-algebra/src/simplify/null.rs 50.00% <ø> (ø)
src/daft-algebra/src/simplify/numeric.rs 93.87% <ø> (ø)
src/daft-core/src/series/ops/arithmetic.rs 55.96% <ø> (ø)
src/daft-dsl/src/arithmetic/mod.rs 100.00% <ø> (ø)
src/daft-dsl/src/expr/mod.rs 39.86% <ø> (+0.37%) ⬆️
src/daft-logical-plan/src/ops/join.rs 85.47% <ø> (ø)
...lan/src/optimization/rules/eliminate_cross_join.rs 80.56% <ø> (ø)
... and 10 more

... and 45 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

}

#[test]
fn compare_list_arrays_basic() -> DaftResult<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice to have a few meor test cases here. maybe we could use rstest to create some parametrized tests to cover a larger area. We have other parts of the codebse that use parametrized rstests.

Copy link
Member

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great! good work @aaron-ang

two things i'd like to see before we merge this in.

  • a few more rust based unit tests
  • python unit tests testing the compare ops using series & dataframes.

@universalmind303 universalmind303 merged commit 6907e71 into Eventual-Inc:main Feb 3, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comparison operators for list and struct types

2 participants