-
Notifications
You must be signed in to change notification settings - Fork 128
Description
EDIT:
This is due to a breaking change between datafusion 48 -> 49 affecting the FFI boundary between datafusion-python
and pyiceberg_core
Describe the bug
Found an issue with datafusion 49 when used with pyiceberg and pyiceberg_core (which uses the datafusion
rust crate). This test test_datafusion_register_pyiceberg_table
fails when using datafusion==49 but succeed with datafusion 48,47,46
I'll find a better way to reproduce, but right now heres one way using the pyiceberg
's test suite
make install
poetry run pip install datafusion==49
poetry run pytest tests/table/test_datafusion.py::test_datafusion_register_pyiceberg_table
For context, there are 3 libraries involved
- datafusion
- pyiceberg, which currently uses datafusion==47
- pyiceberg_core==0.5.1, which pyiceberg pulls in as dependency for datafusion TableProvider. pyiceberg_core==0.5.1 uses datafusion 47
I wonder if its due to the difference in version of the datafusion
rust crate
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
Heres the stack trace
➜ iceberg-python git:(main) ✗ poetry run pytest tests/table/test_datafusion.py::test_datafusion_register_pyiceberg_table
========================================================= test session starts ==========================================================
platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.6.0
rootdir: /Users/kevinliu/repos/iceberg-python
configfile: pyproject.toml
plugins: checkdocs-2.13.0, anyio-4.10.0, mock-3.14.1, lazy-fixture-0.6.3, requests-mock-1.12.1
collected 1 item
tests/table/test_datafusion.py Fatal Python error: Bus error
Thread 0x000000016cf3b000 (most recent call first):
File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/concurrent/futures/thread.py", line 90 in _worker
File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1012 in run
File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1032 in _bootstrap
Thread 0x000000016bf2f000 (most recent call first):
File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/concurrent/futures/thread.py", line 90 in _worker
File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1012 in run
File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
File "/Users/kevinliu/.pyenv/versions/3.12.11/lib/python3.12/threading.py", line 1032 in _bootstrap
Current thread 0x00000001ef2620c0 (most recent call first):
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/datafusion/dataframe.py", line 1019 in to_arrow_table
File "/Users/kevinliu/repos/iceberg-python/tests/table/test_datafusion.py", line 57 in test_datafusion_register_pyiceberg_table
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/python.py", line 1792 in runtest
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 262 in <lambda>
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 222 in call_and_report
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 133 in runtestprotocol
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/main.py", line 350 in pytest_runtestloop
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/main.py", line 325 in _main
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/main.py", line 271 in wrap_session
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/main.py", line 318 in pytest_cmdline_main
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/config/__init__.py", line 169 in main
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/lib/python3.12/site-packages/_pytest/config/__init__.py", line 192 in console_main
File "/Users/kevinliu/Library/Caches/pypoetry/virtualenvs/pyiceberg-Is5Rt7Ah-py3.12/bin/pytest", line 8 in <module>
Extension modules: zstandard.backend_c, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, markupsafe._speedups, mmh3, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, pyarrow.lib, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, pyarrow._acero, pyarrow._fs, pyarrow._csv, pyarrow._json, pyarrow._substrait, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, pyroaring (total: 81)
[1] 95409 bus error poetry run pytest