Skip to content

ENH: Add Option to Include Array Offset as MultiIndex Level in explode() #59163

@chelsea-lin

Description

@chelsea-lin

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently, df.explode() and s.explode() flatten lists/arrays within Series/DataFrames. However, information about the original position of each element within its list is lost. This makes it difficult to:

  • Easily access specific sub-values after exploding.
  • Reconstruct the original nested structure if needed.

Proposed Solution:
Introduce a new parameter, offset, to both df.explode() and s.explode().

Example Usage:

>>> s = pd.Series([[1, 2, 3], 'foo', [], [3, 4]])
>>> s
0    [1, 2, 3]
1          foo
2           []
3       [3, 4]
dtype: object
>>> s.explode() # <- Current behavior:
0         1
0         2
0         3
1       foo
2       NaN
3         3
3         4
dtype: object

>>> s.explode(offset=True) # <- With proposed feature
0  1         1
   2         2
   3         3
1  1       foo
2  1       NaN
3  1         3
   2         4
dtype: object

Feature Description

Introduce a new parameter, offset, to both df.explode() and s.explode().

def explode(self, ..., offset: bool = False):  # Default to False for backward compatibility
    """
    Parameters:
        ...
        offset: If True, include the original array offset as a level in the resulting MultiIndex.
    """

Alternative Solutions

While it's technically possible to infer the offset in some cases, it requires additional steps and assumptions about the data. The offset parameter provides a direct, intuitive solution.

Additional Context

No response

Metadata

Metadata

Assignees

Labels

EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions