Skip to content

ENH: Droping consecutive duplicates #59874

@Yehuda-Bergstein

Description

@Yehuda-Bergstein

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Hi! so the function drop_duplicates allows dropping duplicates but doesn't take into consideration consecutive duplicates.
For example, a panda series -

[1,2,2,3,2,4,5,6]

Will result in after the drop_duplicates is applied:

[1,2,3,4,5,6]

But I would want to achieve this:

[1,2,3,2,4,5,6]

(There is a lengthy discussion about it HERE

Do you think the implementation I wrote below will be the best way to apply this change? (of course with many more adjustments)
If so, I could try to add it via a PR

Feature Description

def drop_duplicates(
    self,
    *,
    keep: DropKeep= "first",
    inplace: bool = False,
    ignore_index: bool = False,
    consecutive: bool = False  # New parameter for consecutive duplicates
) -> "Series" | None:
    """
    Return Series with duplicate values removed.

    Parameters
    ----------
    keep : {'first', 'last', False}, default 'first'
        Method to handle dropping duplicates:
        - 'first': Keep the first occurrence of each duplicate.
        - 'last': Keep the last occurrence of each duplicate.
        - False: Remove all duplicates.
    inplace : bool, default False
        If True, perform operation inplace and return None.
    ignore_index : bool, default False
        If True, the resulting Series will be re-indexed from 0 to n-1.
    consecutive : bool, default False
        If True, only remove consecutive duplicates.

    Returns
    -------
    Series or None
        Series with duplicates dropped, or None if inplace=True.
    """


    if consecutive:
        result = self[self != self.shift(1)]
    else:
        # Handle the default drop_duplicates behavior
        result = super().drop_duplicates(keep=keep)

    if ignore_index:
        result.index = pd.RangeIndex(len(result))

    if inplace:
        self._update_inplace(result)
        return None
    else:
        return result

Alternative Solutions

.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsEnhancementNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions