-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I wish I could use pandas to create an IntervalIndex like [0.0, 0.1), [0.1, 0.2), ..., [0.9, 1.0].
Currently, I successfully cut my data frame using the following code:
bin_edges = [i / 10.0 for i in range(0, 11)] # Create bins from 0.0 to 1.0 with a size of 0.1
df['bin'] = pd.cut(df['ratio'], bins=bin_edges, include_lowest=True)
print(df['bin'].unique())
# Output:
# [(-0.001, 0.1], (0.9, 1.0], (0.4, 0.5], (0.8, 0.9], (0.3, 0.4], ..., (0.5, 0.6], (0.2, 0.3], (0.1, 0.2], (0.6, 0.7], NaN]
# Length: 11
# Categories (10, interval[float64, right]): [(-0.001, 0.1] < (0.1, 0.2] < (0.2, 0.3] < (0.3, 0.4] ... (0.6, 0.7] < (0.7, 0.8] < (0.8, 0.9] < (0.9, 1.0]]
Then, I want to reindexing to add back the missing missing bins and fills them with zero counts using fillna(0)
with the following code
bin_counts = df['bin'].value_counts().reindex(pd.IntervalIndex.from_breaks(bin_edges)).fillna(0)
However, I found the (-0.001, 0.1]
with a non-zero count is replaced by (0, 0.1]
with a zero count.
I ask ChatGPT, and it recommends to use bin_counts = df['bin'].value_counts().reindex(pd.IntervalIndex.from_edges(bin_edges)).fillna(0)
which lead to AttributeError: type object 'IntervalIndex' has no attribute 'from_edges'
Feature Description
Add a new function to IntervalIndex, from_edges, to return IntervalIndex based on bin edges.
def from_breaks(cls, bins) -> IntervalIndex
Alternative Solutions
I haven't found one yet
Additional Context
No response