Skip to content

Conversation

@ion-elgreco
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Many times join keys are the same in both dataframes, so a simple on suffices. Also join_keys are deprecated over left_on and right_on.

What changes are included in this PR?

  • marks join_keys as deprecated
  • introduces on, left_on, right_on parameters
  • adds function overloads

Are there any user-facing changes?

  • new params added, join_keys deprecated.

@ion-elgreco ion-elgreco changed the title refactor: dataframe join params refactor: dataframe join params Oct 13, 2024
Copy link
Member

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good improvement.

@ion-elgreco ion-elgreco requested a review from timsaucer October 13, 2024 15:03
@ion-elgreco
Copy link
Contributor Author

This looks like a good improvement.

Any other changes still needed?

@timsaucer
Copy link
Member

None that I can see. I'll run the workflow and as long as it all passes, I'll merge it later today

@timsaucer
Copy link
Member

Thank you!

@timsaucer
Copy link
Member

It looks like we do have some problems in pytests

@ion-elgreco
Copy link
Contributor Author

@timsaucer Ah my bad, we need to add this at the end of the function, then it should run:

if isinstance(left_on,str):
    left_on=[left_on]
if isinstance(right_on,str):
    right_on=[right_on]

Can't add it now myself to test :(

@timsaucer
Copy link
Member

Thanks for the update to fix the CI. We talked on discord and it brought up another issue - this is a breaking change for anyone who is currently doing a join without specifying the on= keyword. So we either need to make a change such that the old method of df1.join(df2, (["col_a"], ["col_b"])) will still work OR we have to make it clear that we've made a breaking API change and all users will need to update every instance of join in their code.

@timsaucer
Copy link
Member

Ok, I made the adjustment. Assuming CI passes I'll merge.

@timsaucer timsaucer merged commit 4a6c4d1 into apache:main Nov 8, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants