-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Closed
Labels
Closing CandidateMay be closeable, needs more eyeballsMay be closeable, needs more eyeballsEnhancementNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Hi! so the function drop_duplicates allows dropping duplicates but doesn't take into consideration consecutive duplicates.
For example, a panda series -
[1,2,2,3,2,4,5,6]
Will result in after the drop_duplicates is applied:
[1,2,3,4,5,6]
But I would want to achieve this:
[1,2,3,2,4,5,6]
(There is a lengthy discussion about it HERE
Do you think the implementation I wrote below will be the best way to apply this change? (of course with many more adjustments)
If so, I could try to add it via a PR
Feature Description
def drop_duplicates(
self,
*,
keep: DropKeep= "first",
inplace: bool = False,
ignore_index: bool = False,
consecutive: bool = False # New parameter for consecutive duplicates
) -> "Series" | None:
"""
Return Series with duplicate values removed.
Parameters
----------
keep : {'first', 'last', False}, default 'first'
Method to handle dropping duplicates:
- 'first': Keep the first occurrence of each duplicate.
- 'last': Keep the last occurrence of each duplicate.
- False: Remove all duplicates.
inplace : bool, default False
If True, perform operation inplace and return None.
ignore_index : bool, default False
If True, the resulting Series will be re-indexed from 0 to n-1.
consecutive : bool, default False
If True, only remove consecutive duplicates.
Returns
-------
Series or None
Series with duplicates dropped, or None if inplace=True.
"""
if consecutive:
result = self[self != self.shift(1)]
else:
# Handle the default drop_duplicates behavior
result = super().drop_duplicates(keep=keep)
if ignore_index:
result.index = pd.RangeIndex(len(result))
if inplace:
self._update_inplace(result)
return None
else:
return result
Alternative Solutions
.
Additional Context
No response
Metadata
Metadata
Assignees
Labels
Closing CandidateMay be closeable, needs more eyeballsMay be closeable, needs more eyeballsEnhancementNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action