Skip to content

ENH: add option to DataFrame.read_excel so that excel hidden rows are not imported in the dataframe #42562

@laurent-mutricy

Description

@laurent-mutricy

Is your feature request related to a problem?

I wish DataFrame.read_excel could skip automatically the rows that are hidden in excel. I want to read data from an excel sheet but I want to select within excel the data that will go to the dataFrame. I am using excel filter to select data of interest so some lines are hidden. I can use this workaround but it implies reading twice the excel sheet which in my case is quite big.

Describe the solution you'd like

DataFrame.read_excel should get a new parameter skip_hidden that is default Fasle but when set to True will DataFrame.read_excel will skip the hidden rows

API breaking implications

Additionnal parameter to DataFrame.read_excel with default value so I guess no API breaking risk.

Describe alternatives you've considered

Going for a workaround with openpyxl which implies opening twice the excel sheet which is expensive. stackoverflow example

Additional context

looking at pandas/io/excel/_openpyxl.py function get_sheet_data, considering new parameter to read_excel named skip_hidden.
modification looks quite simple:

    def get_sheet_data(self, sheet, convert_float: bool, skip_hidden: bool) -> list[list[Scalar]]:

        if self.book.read_only:
            sheet.reset_dimensions()
        rowDim=sheet.row_dimensions     # added
        data: list[list[Scalar]] = []
        last_row_with_data = -1
        for row_number, row in enumerate(sheet.rows):
            converted_row = [self._convert_cell(cell, convert_float) for cell in row]
            while converted_row and converted_row[-1] == "":
                # trim trailing empty elements
                converted_row.pop()
            if converted_row:
                last_row_with_data = row_number
            if not (skip_hidden and rowDim[row_number].hidden) :    # added
                data.append(converted_row)

        # Trim trailing empty rows
        data = data[: last_row_with_data + 1]

        if len(data) > 0:
            # extend rows to max width
            max_width = max(len(data_row) for data_row in data)
            if min(len(data_row) for data_row in data) < max_width:
                empty_cell: list[Scalar] = [""]
                data = [
                    data_row + (max_width - len(data_row)) * empty_cell
                    for data_row in data
                ]

        return data

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO Excelread_excel, to_excelNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions