Skip to content

SNOW-1524760: Fix Series.isin behavior#3973

Merged
sfc-gh-joshi merged 5 commits intomainfrom
joshi-SNOW-1524760-isin-series-fix
Oct 30, 2025
Merged

SNOW-1524760: Fix Series.isin behavior#3973
sfc-gh-joshi merged 5 commits intomainfrom
joshi-SNOW-1524760-isin-series-fix

Conversation

@sfc-gh-joshi
Copy link
Contributor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1524760

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

This PR fixes the behavior of Series.isin(other_series), which ignores indices instead of joining on row/column labels. It also adds a fast path for Series.isin(dataframe), which should always return false at every index.

Per @sfc-gh-mvashishtha's investigation in the linked ticket:

It seems that pandas behavior is:

  • ignore row and column labels for Series.isin(series)
  • Series.isin(dataframe) always returns False, e.g. s = pandas.Series([1]); s.isin(s.to_frame())
  • DataFrame.isin(dataframe) joins on both row and column labels
  • DataFrame.isin(series) ignores column labels but not row labels, e.g. pandas.DataFrame({'A': [1, 2]}).isin(pandas.Series([1, 2], name='B', index=[0,1])) gives True values because even though the column name is different, the index matches, but pandas.DataFrame({'A': [1, 2]}).isin(pandas.Series([1, 2], name='B', index=[-1, -2])) gives False values.

Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that some query counts need to be fixed. But otherwise looks good to me. Thanks!

@sfc-gh-joshi
Copy link
Contributor Author

It seems that some query counts need to be fixed.

Done! We might actually be able to eliminate the joins with some changes to aggregation logic, and I'll look into it in a follow-up PR.



@sql_count_checker(query_count=3)
@sql_count_checker(query_count=3, join_count=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised this changed; because you didn't need to change the join counts on other isin tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the query generated in this test got significantly more complicated because of the added ARRAY_AGG operation, but I believe the previous version was incorrect for a lot of cases. The actual query text of the other isin tests have joins as well, but I guess they're not being parsed correctly by the SQL counter.

@sfc-gh-joshi sfc-gh-joshi merged commit 367fc26 into main Oct 30, 2025
44 of 49 checks passed
@sfc-gh-joshi sfc-gh-joshi deleted the joshi-SNOW-1524760-isin-series-fix branch October 30, 2025 17:55
@github-actions github-actions bot locked and limited conversation to collaborators Oct 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants