-
Notifications
You must be signed in to change notification settings - Fork 146
Open
Description
When finding the frequencies of a dataframe I seem to get incorrect counts when the first column in the grouping has a nil. To demonstrate:
- Working example and no nils
df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, 2])
Explorer.DataFrame.frequencies(df, [:a, :b], stable: true)As expected:
#Explorer.DataFrame<
Polars[2 x 3]
a string ["a", "b"]
b s64 [1, 2]
counts u32 [2, 1]
>
- Still working: switching the
2to anilbut keeping the order the same:
df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, nil])
Explorer.DataFrame.frequencies(df, [:a, :b], stable: true)As expected:
#Explorer.DataFrame<
Polars[2 x 3]
a string ["a", "b"]
b s64 [1, nil]
counts u32 [2, 1]
>
- Different than expected: switching the order of the columns so that the column with nil is first:
df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, nil])
Explorer.DataFrame.frequencies(df, [:b, :a], stable: true)#Explorer.DataFrame<
Polars[2 x 3]
b s64 [1, nil]
a string ["a", "b"]
counts u32 [2, 0]
>
There are 2 counts of the first combination as before, but now there are none for the second ({"b", nil}), though it is only the ordering of the columns that changed.
First time reporting a bug - extra info if needed:
Explorer 0.11.1
Elixir 1.19.3
OTP release 28
Metadata
Metadata
Assignees
Labels
No labels