Skip to content

ENH: Implement skiptrailingspace parameter for the read_table functionΒ #42054

@owenlamont

Description

@owenlamont

Is your feature request related to a problem?

I had to do several additional manual data cleaning steps to parse certain markdown tables using the read_table function. The most cumbersome was not having an option to strip trailing spaces between values and separators. See mock code example at the end of this description.

Describe the solution you'd like

I would like the read_table function to have a skiptrailingspace boolean parameter that would automatically strip trailing whitespace in the same way the skipinitialspace parameter enables stripping leading whitespace.

API breaking implications

It would involve adding a new named parameter to read_table - so if users were calling with positional arguments it would be a breaking change if it wasn't made the last argument to the function.

Describe alternatives you've considered

Manual data cleaning is tolerable but more onerous. Another alternative that would be ideal from my perspective would be a read_markdown function that could extract indexed tables from markdown text in the same way the read_html function does for html. I suppose a related alternative is to use another Python package to render the markdown then use read_html but that is also onerous.

Additional context

See code example below for my current manual work-around.

import pandas as pd
import io

annoying_table = """
| Heading 1 | Heading 2 | Heading 3 |
| --------- | --------- | --------- |
|      3.14 | cat       | 72        |
|        42 | mat       | 87.3      |
|   1234.56 | rat       | 128       |"""

df = (
    pd.read_table(
        filepath_or_buffer=io.StringIO(annoying_table),
        sep="|",
        skipinitialspace=True,
        # Would like a skiptrailingspace=True too
        skiprows=[2],
    )
    .convert_dtypes()
    .iloc[:, 1:-1]
)

df.columns = df.columns.str.strip()
print(df)

print(df.loc[0,"Heading 2"]) # output "cat       "

# Hacky work-around to strip trailing spaces
for column in df.select_dtypes("string").columns:
    df[column] = df[column].str.strip()

print(df.loc[0,"Heading 2"]) # output "cat"

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO CSVread_csv, to_csvNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions