-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Is your feature request related to a problem?
I had to do several additional manual data cleaning steps to parse certain markdown tables using the read_table function. The most cumbersome was not having an option to strip trailing spaces between values and separators. See mock code example at the end of this description.
Describe the solution you'd like
I would like the read_table function to have a skiptrailingspace boolean parameter that would automatically strip trailing whitespace in the same way the skipinitialspace parameter enables stripping leading whitespace.
API breaking implications
It would involve adding a new named parameter to read_table - so if users were calling with positional arguments it would be a breaking change if it wasn't made the last argument to the function.
Describe alternatives you've considered
Manual data cleaning is tolerable but more onerous. Another alternative that would be ideal from my perspective would be a read_markdown function that could extract indexed tables from markdown text in the same way the read_html function does for html. I suppose a related alternative is to use another Python package to render the markdown then use read_html but that is also onerous.
Additional context
See code example below for my current manual work-around.
import pandas as pd
import io
annoying_table = """
| Heading 1 | Heading 2 | Heading 3 |
| --------- | --------- | --------- |
| 3.14 | cat | 72 |
| 42 | mat | 87.3 |
| 1234.56 | rat | 128 |"""
df = (
pd.read_table(
filepath_or_buffer=io.StringIO(annoying_table),
sep="|",
skipinitialspace=True,
# Would like a skiptrailingspace=True too
skiprows=[2],
)
.convert_dtypes()
.iloc[:, 1:-1]
)
df.columns = df.columns.str.strip()
print(df)
print(df.loc[0,"Heading 2"]) # output "cat "
# Hacky work-around to strip trailing spaces
for column in df.select_dtypes("string").columns:
df[column] = df[column].str.strip()
print(df.loc[0,"Heading 2"]) # output "cat"