ENH: Improve Code Quality in pandas/core/reshape Module

## Summary

Refactor the pandas/core/reshape module to improve code quality by reducing duplication, replacing hard-coded values, and simplifying complex conditionals.

## Problem Description

The pandas/core/reshape module implements key reshaping functions (pivot, melt, and unstack) used in data manipulation workflows. A review of pivot.py and melt.py reveals a couple of areas where code quality could be improved:

**Nested Conditionals:**
  * In melt.py, nested conditionals add complexity, making the code harder to read and maintain.
  * Suggestion: Refactor these conditionals into smaller, more modular functions.

**Hard-Coded Values:**
  * In pivot.py, hard-coded strings (e.g., "All" for margins) reduce flexibility.
  * Suggestion: Replace hard-coded values with constants for maintainability.
  
## Relevant File
* **melt.py**
* **pivot.py**

## Proposed Solution
**Refactor Nested Conditionals in melt.py**
* Nested Conditional in `ensure_list_vars()`
  * Before:
  ```python
  def ensure_list_vars(arg_vars, variable: str, columns) -> list:
    if arg_vars is not None:
        if not is_list_like(arg_vars):
            return [arg_vars]
        elif isinstance(columns, MultiIndex) and not isinstance(arg_vars, list):
            raise ValueError(
                f"{variable} must be a list of tuples when columns are a MultiIndex"
            )
        else:
            return list(arg_vars)
    else:
        return []
    ```
    * After:
    ```python
    def ensure_list_vars(arg_vars, variable: str, columns) -> list:
    if arg_vars is None:
        return []

    if not is_list_like(arg_vars):
        return [arg_vars]

    if isinstance(columns, MultiIndex) and not isinstance(arg_vars, list):
        raise ValueError(
            f"{variable} must be a list of tuples when columns are a MultiIndex"
        )

    return list(arg_vars)
    ```
* Nested Conditional in `melt()` for `id_vars`:
  * Before:
  ```python
  if id_vars or value_vars:
    if col_level is not None:
        level = frame.columns.get_level_values(col_level)
    else:
        level = frame.columns
    labels = id_vars + value_vars
    idx = level.get_indexer_for(labels)
    missing = idx == -1
    if missing.any():
        missing_labels = [
            lab for lab, not_found in zip(labels, missing) if not_found
        ]
        raise KeyError(
            "The following id_vars or value_vars are not present in "
            f"the DataFrame: {missing_labels}"
        )
    if value_vars_was_not_none:
      frame = frame.iloc[:, algos.unique(idx)]
    else:
      frame = frame.copy(deep=False)
  else:
    frame = frame.copy(deep=False)
  ```
  * After:
  ```python
  def validate_and_get_level(frame, id_vars, value_vars, col_level):
    level = frame.columns.get_level_values(col_level) if col_level is not None else frame.columns
    labels = id_vars + value_vars
    idx = level.get_indexer_for(labels)
    missing = idx == -1
    if missing.any():
        missing_labels = [lab for lab, not_found in zip(labels, missing) if not_found]
        raise KeyError(
            "The following id_vars or value_vars are not present in "
            f"the DataFrame: {missing_labels}"
        )
    return idx

  if id_vars or value_vars:
      idx = validate_and_get_level(frame, id_vars, value_vars, col_level)
      if value_vars_was_not_none:
          frame = frame.iloc[:, algos.unique(idx)]
  else:
      frame = frame.copy(deep=False)
  ```
* Nested Conditionals for Setting `var_name` in `melt()`:
  * Before:
  ```python
  if var_name is None:
    if isinstance(frame.columns, MultiIndex):
        if len(frame.columns.names) == len(set(frame.columns.names)):
            var_name = frame.columns.names
        else:
            var_name = [f"variable_{i}" for i in range(len(frame.columns.names))]
    else:
        var_name = [
            frame.columns.name if frame.columns.name is not None else "variable"
        ]
  elif is_list_like(var_name):
    if isinstance(frame.columns, MultiIndex):
        if is_iterator(var_name):
            var_name = list(var_name)
        if len(var_name) > len(frame.columns):
            raise ValueError(
                f"{var_name=} has {len(var_name)} items, "
                f"but the dataframe columns only have {len(frame.columns)} levels."
            )
    else:
        raise ValueError(f"{var_name=} must be a scalar.")
  else:
    var_name = [var_name]
  ```
  After:
  ```python
  def determine_var_name(frame, var_name):
    if var_name is None:
        return _default_var_name(frame)
    if is_list_like(var_name):
        _validate_list_var_name(var_name, frame)
        return list(var_name)
    return [var_name]

  def _default_var_name(frame):
    if isinstance(frame.columns, MultiIndex):
        if len(frame.columns.names) == len(set(frame.columns.names)):
            return frame.columns.names
        return [f"variable_{i}" for i in range(len(frame.columns.names))]
    return [frame.columns.name or "variable"]

  def _validate_list_var_name(var_name, frame):
    if isinstance(frame.columns, MultiIndex):
        if is_iterator(var_name):
            var_name = list(var_name)
        if len(var_name) > len(frame.columns):
            raise ValueError(
                f"{var_name=} has {len(var_name)} items, "
                f"but the dataframe columns only have {len(frame.columns)} levels."
            )
    else:
        raise ValueError(f"{var_name=} must be a scalar.")

  var_name = determine_var_name(frame, var_name)
  ```
* Benefits:
  * Improves readability:
  Simplifies the main function, making the logic clearer and easier to follow.
  * Makes the logic easier to test and maintain:
  Enables independent testing of each helper function, ensuring robust behavior.
  * Separation of concerns:
  Each helper function is now responsible for a single, well-defined task, aligning with the principle of single responsibility.

**Replace Hard-Coded Values in pivot.py**
  * Before:
  ```python
  # Hard-coded string for margins
  margins_name: Hashable = "All"
  ```
  * After:
  ```python
  # Define a constant for the hard-coded value
  MARGIN_NAME = "All"
  
  # Use the constant in the code
  margins_name: Hashable = MARGIN_NAME:
  ```
  * Benefits:
    * Makes the code more readable and maintainable.
    * Centralizes the value so it can be reused or modified easily.

## Testing
**Unit Testing Helper Functions:**
Write focused tests for each new helper function to validate their behavior under expected, edge, and erroneous inputs. For example:
* Ensure validate_and_get_level() correctly identifies missing variables and raises KeyError.
* Test determine_var_name() with var_name=None, scalar inputs, and multi-level columns.

**Regression Testing Parent Functions:**
Run all pre-existing tests for the parent functions (e.g., melt()) to confirm they maintain their functionality after the refactor.

**Edge Cases:**
Include additional tests for edge scenarios, such as:
* Empty id_vars or value_vars.
* DataFrames with unusual column configurations like MultiIndex or missing names.

## Labels
* `ENH`
* `Code Quality`

## Compliance with Contributing Guide
* **Focus:** The issue is specific and addresses code quality improvements without scope creep.
* **Clarity:** Includes actionable suggestions and a clear implementation path.

### Please provide feedback and let me know if you would like further refinements!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Improve Code Quality in pandas/core/reshape Module #60370

Summary

Problem Description

Relevant File

Proposed Solution

Testing

Labels

Compliance with Contributing Guide

Please provide feedback and let me know if you would like further refinements!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Improve Code Quality in pandas/core/reshape Module #60370

Description

Summary

Problem Description

Relevant File

Proposed Solution

Testing

Labels

Compliance with Contributing Guide

Please provide feedback and let me know if you would like further refinements!

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions