Skip to content

Conversation

phofl
Copy link
Member

@phofl phofl commented Mar 24, 2024

I want to add anti joins as a follow up, which can be incorporated into the new class

@phofl phofl requested review from Dr-Irv and WillAyd as code owners March 24, 2024 02:14
@phofl phofl added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Enhancement labels Mar 24, 2024
khiter_t k
Int64Vector locs = Int64Vector()
Int64Vector self_locs = Int64Vector()
Int64VectorData *l
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic nit but would be better to not use single character variable names, especially those that conflict with debugger aliases. The cython debugger is hard enough to use as is :-)

Int64Vector self_locs = Int64Vector()
Int64VectorData *l
Int64VectorData *sl
# mask not implemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we raise NotImplementedError here for now?


for i in range(n):
val = values[i]
hash(val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point of this just to raise an error if an object is not hashable?

val = values[i]
hash(val)

k = kh_get_pymap(self.table, <PyObject*>val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cast required here? I assume it would know we are already working with PyObjects given the declaration of values

right_keys: list[ArrayLike],
sort: bool = False,
how: JoinHow = "inner",
how: JoinHow + Literal["leftsemi"] = "inner",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why we don't just add leftsemi to the JoinHow definition?

tm.assert_frame_equal(result, expected.head(1))


def test_leftsemi_invalid():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for raising NotImplementederror when passing a mask?

Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerned about whether this argument will apply to DataFrame.join().

* inner: use intersection of keys from both frames, similar to a SQL inner
join; preserve the order of the left keys.
* leftsemi: Filter for rows in the left that have a match on the right;
preserve the order of the left keys. Doesn't support `left_index`, `right_index`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
preserve the order of the left keys. Doesn't support `left_index`, `right_index`,
preserve the order of the left keys. Does not support `left_index`, `right_index`,


# merge
MergeHow = Literal["left", "right", "inner", "outer", "cross"]
MergeHow = Literal["left", "right", "inner", "outer", "cross", "leftsemi"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MergeHow is also used as the type for how in DataFrame.join().

So if DataFrame.join() will not support the "leftsemi" operation, then you'll need to split this type into a JoinHow and MergeHow, with the latter dependent on the former.

If DataFrame.join() will support that operation, then you'll have to update the docs for join and add a test for it.

Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Apr 26, 2024
@mroeschke
Copy link
Member

Closing to clear the queue, but feel free to update and reopen anytime

@mroeschke mroeschke closed this Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Support Left Semi Joins

4 participants