Skip to content

Conversation

@bhanreddy1973
Copy link

Which issue does this PR close?

Closes #19155

Rationale for this change

Spark's make_interval function needs to properly report output nullability:

  • When no arguments: returns a non-null zero interval
  • When arguments are provided: output is nullable because:
    • NULL input → NULL output
    • Arithmetic overflow → NULL output

Currently, return_type only reports the data type, not nullability information.

What changes are included in this PR?

Added return_field_from_args implementation to SparkMakeInterval:

  • With zero arguments: returns nullable = false (always returns 0 interval)
  • With any arguments: returns nullable = true (null inputs or overflow possible)

How are these changes tested?

All existing unit tests pass:

  • nulls_propagate_per_row
  • error_months_overflow_should_be_null
  • error_days_overflow_should_be_null
  • error_min_overflow_should_be_null
  • error_sec_overflow_should_be_null
  • happy_path_all_present_single_row
  • negative_components_and_fractional_seconds
  • zero_args_returns_zero_seconds

Are these changes safe?

Yes - this only adds metadata about nullability, doesn't change the actual function behavior.

@github-actions github-actions bot added the spark label Jan 2, 2026
@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 2, 2026

Please see this comment: #19155 (comment)

I don't think the changes proposed here follow Spark semantics

@bhanreddy1973
Copy link
Author

Hi @Jefffrey, thanks for the review!

I read the comment on #19155 but want to make sure I understand correctly.

Looking at other Spark functions in datafusion-spark (sha1, like, length, ilike), they all use:
let nullable = args.arg_fields.iter().any(|f| f.is_nullable());Should make_interval follow this same pattern?

My concern was that make_interval can return NULL on arithmetic overflow (e.g., year * 12 + month overflows), even when inputs are non-null. Should overflow:

  1. Make the output always nullable (current implementation)
  2. Only make output nullable if inputs are nullable (standard pattern)
  3. Depend on a failOnError / ANSI mode config?

Happy to update the implementation once I understand the expected behavior. Thanks!

- Implement return_field_from_args for SparkMakeInterval
- Output is nullable if ANY input argument is nullable
- Follows same pattern as other Spark functions (sha1, like, length, ilike)

Closes apache#19155
@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 2, 2026

Hi @Jefffrey, thanks for the review!

I read the comment on #19155 but want to make sure I understand correctly.

Looking at other Spark functions in datafusion-spark (sha1, like, length, ilike), they all use: let nullable = args.arg_fields.iter().any(|f| f.is_nullable());Should make_interval follow this same pattern?

My concern was that make_interval can return NULL on arithmetic overflow (e.g., year * 12 + month overflows), even when inputs are non-null. Should overflow:

1. Make the output always nullable (current implementation)

2. Only make output nullable if inputs are nullable (standard pattern)

3. Depend on a `failOnError` / ANSI mode config?

Happy to update the implementation once I understand the expected behavior. Thanks!

All the questions here are answered very clearly in the comment I linked.

@bhanreddy1973 bhanreddy1973 force-pushed the fix-make-interval-nullability branch from 739364b to cc6c12b Compare January 2, 2026 10:39
@bhanreddy1973
Copy link
Author

Hi @Jefffrey,

I've reviewed issue #19155. The bug states that is_nullable "is always true which is not the case."

I've updated the implementation to:
let nullable = args.arg_fields.iter().any(|f| f.is_nullable());Now the output is:

  • NOT nullable when all inputs are non-nullable
  • Nullable when any input is nullable

This matches the pattern used by other Spark functions (sha1, like, length, ilike).

Could you let me know if this is the expected behavior, or if there's something specific in the linked comment I'm missing? Thanks!

@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 2, 2026

Current Behavior

Without ANSI mode support, make_interval defaults to always nullable (matching Spark's failOnError=false), which is already the default behavior no code change needed.

From #19155 (comment)

I don't think we can proceed with this PR.

@Jefffrey Jefffrey closed this Jan 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spark make_interval need to have custom nullability

3 participants