Skip to content

Incorrect frequencies when nil in first column #1135

@jrrrp

Description

@jrrrp

When finding the frequencies of a dataframe I seem to get incorrect counts when the first column in the grouping has a nil. To demonstrate:

  1. Working example and no nils
df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, 2])
Explorer.DataFrame.frequencies(df, [:a, :b], stable: true)

As expected:

#Explorer.DataFrame<
  Polars[2 x 3]
  a string ["a", "b"]
  b s64 [1, 2]
  counts u32 [2, 1]
>
  1. Still working: switching the 2 to a nil but keeping the order the same:
df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, nil])
Explorer.DataFrame.frequencies(df, [:a, :b], stable: true)

As expected:

#Explorer.DataFrame<
  Polars[2 x 3]
  a string ["a", "b"]
  b s64 [1, nil]
  counts u32 [2, 1]
>
  1. Different than expected: switching the order of the columns so that the column with nil is first:
df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, nil])
Explorer.DataFrame.frequencies(df, [:b, :a], stable: true)
#Explorer.DataFrame<
  Polars[2 x 3]
  b s64 [1, nil]
  a string ["a", "b"]
  counts u32 [2, 0]
>

There are 2 counts of the first combination as before, but now there are none for the second ({"b", nil}), though it is only the ordering of the columns that changed.

First time reporting a bug - extra info if needed:

Explorer 0.11.1
Elixir 1.19.3
OTP release 28

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions