-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I wish I could use Pandas to easily convert numbers formatted in the Brazilian style (1.234,56) into numeric types.
Currently, pd.to_numeric() does not support this format, and users have to manually apply .str.replace(".", "").replace(",", "."), which is not intuitive.
This feature would simplify data handling for users in Brazil and other countries with similar numerical formats.
Feature Description
Add a new function to_numeric_br() to automatically convert strings with the Brazilian numeric format into floats.
Proposed Implementation (Pseudocode)
def to_numeric_br(series, errors="raise"):
"""
Converts Brazilian-style numeric strings (1.234,56) into float.
Parameters:
----------
series : pandas.Series
Data to be converted.
errors : str, default 'raise'
- 'raise' : Throws an error for invalid values.
- 'coerce' : Converts invalid values to NaN.
- 'ignore' : Returns the original data in case of error.
Returns:
-------
pandas.Series with numeric values.
"""Expected Behavior
import pandas as pd
df = pd.DataFrame({"values": ["1.234,56", "5.600,75", "100,50"]})
df["converted_values"] = to_numeric_br(df["values"], errors="coerce")
print(df)Expected Output:
values converted_values
0 1.234,56 1234.56
1 5.600,75 5600.75
2 100,50 100.50Alternatively, instead of a standalone function, this could be implemented as an enhancement to pd.to_numeric(), adding a locale="br" parameter.
Alternative Solutions
Currently, users must manually apply string replacements before using pd.to_numeric(), like this:
df["values"] = df["values"].str.replace(".", "", regex=True).str.replace(",", ".", regex=True)
df["values"] = pd.to_numeric(df["values"], errors="coerce")While this works, it is not user-friendly, especially for beginners.
Another alternative is using third-party packages like babel, but this requires additional dependencies and is not built into Pandas.
Additional Context
- Similar requests have been made by users handling locale-specific number formats.
- Would the maintainers prefer a standalone function (
to_numeric_br()) or alocaleparameter inpd.to_numeric()? - Happy to implement this if maintainers approve!