Skip to content

"Fatal Python error: Bus error" when using datafusion 49 #1217

@kevinjqliu

Description

@kevinjqliu

EDIT:
This is due to a breaking change between datafusion 48 -> 49 affecting the FFI boundary between datafusion-python and pyiceberg_core

Describe the bug

Found an issue with datafusion 49 when used with pyiceberg and pyiceberg_core (which uses the datafusion rust crate). This test test_datafusion_register_pyiceberg_table fails when using datafusion==49 but succeed with datafusion 48,47,46

I'll find a better way to reproduce, but right now heres one way using the pyiceberg's test suite

make install 
poetry run pip install datafusion==49
poetry run pytest tests/table/test_datafusion.py::test_datafusion_register_pyiceberg_table

For context, there are 3 libraries involved

I wonder if its due to the difference in version of the datafusion rust crate

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Heres the stack trace

➜  iceberg-python git:(main) ✗ poetry run pytest tests/table/test_datafusion.py::test_datafusion_register_pyiceberg_table
========================================================= test session starts ==========================================================
platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.6.0
rootdir: /Users/kevinliu/repos/iceberg-python
configfile: pyproject.toml
plugins: checkdocs-2.13.0, anyio-4.10.0, mock-3.14.1, lazy-fixture-0.6.3, requests-mock-1.12.1
collected 1 item                                                                                                                       

tests/table/test_datafusion.py Fatal Python error: Bus error

Thread 0x000000016cf3b000 (most recent call first):
  File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/concurrent/futures/thread.py", line 90 in _worker
  File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1012 in run
  File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1032 in _bootstrap

Thread 0x000000016bf2f000 (most recent call first):
  File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/concurrent/futures/thread.py", line 90 in _worker
  File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1012 in run
  File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1032 in _bootstrap

Current thread 0x00000001ef2620c0 (most recent call first):
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/datafusion/dataframe.py", line 1019 in to_arrow_table
  File "/Users/kevinliu/repos/iceberg-python/tests/table/test_datafusion.py", line 57 in test_datafusion_register_pyiceberg_table
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/python.py", line 1792 in runtest
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/main.py", line 350 in pytest_runtestloop
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/main.py", line 325 in _main
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/main.py", line 271 in wrap_session
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/main.py", line 318 in pytest_cmdline_main
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/config/__init__.py", line 169 in main
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/config/__init__.py", line 192 in console_main
  File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/bin/pytest", line 8 in <module>

Extension modules: zstandard.backend_c, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, markupsafe._speedups, mmh3, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, pyarrow.lib, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, pyarrow._acero, pyarrow._fs, pyarrow._csv, pyarrow._json, pyarrow._substrait, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, pyroaring (total: 81)
[1]    95409 bus error  poetry run pytest 

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions