Skip to content

Error calling count() on a null dtype struct field #2689

@synchromic

Description

@synchromic

Example:

import daft
from daft import col
df = daft.from_pydict({"a": [
    {"c": None},
    None,
]})
df.show() # works
df.count(col("a").struct.get("c")).show() # breaks

Expected: a dataframe like {"c": [0]}

Actual:

thread '<unnamed>' panicked at src/arrow2/src/array/null.rs:91:9:
cannot set validity of a null array
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/conor/Documents/Programming/daft/Daft/daft/api_annotations.py", line 26, in _wrap
    return timed_method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/conor/Documents/Programming/daft/Daft/daft/analytics.py", line 198, in tracked_method
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/conor/Documents/Programming/daft/Daft/daft/dataframe/dataframe.py", line 2382, in show
    dataframe_display = self._construct_show_display(n)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/conor/Documents/Programming/daft/Daft/daft/dataframe/dataframe.py", line 2339, in _construct_show_display
    for table in get_context().runner().run_iter_tables(builder, results_buffer_size=1):
  File "/Users/conor/Documents/Programming/daft/Daft/daft/runners/pyrunner.py", line 216, in run_iter_tables
    for result in self.run_iter(builder, results_buffer_size=results_buffer_size):
  File "/Users/conor/Documents/Programming/daft/Daft/daft/runners/pyrunner.py", line 211, in run_iter
    yield from results_gen
  File "/Users/conor/Documents/Programming/daft/Daft/daft/runners/pyrunner.py", line 318, in _physical_plan_to_partitions
    materialized_results = done_future.result()
                           ^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/conor/Documents/Programming/daft/Daft/daft/runners/pyrunner.py", line 379, in build_partitions
    partitions = instruction.run(partitions)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/conor/Documents/Programming/daft/Daft/daft/execution/execution_step.py", line 683, in run
    return self._aggregate(inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/conor/Documents/Programming/daft/Daft/daft/execution/execution_step.py", line 687, in _aggregate
    return [input.agg(self.to_agg, self.group_by)]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/conor/Documents/Programming/daft/Daft/daft/table/micropartition.py", line 230, in agg
    return MicroPartition._from_pymicropartition(self._micropartition.agg(to_agg_pyexprs, group_by_pyexprs))
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: cannot set validity of a null array

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingp2 (backlog)Nice to have features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions