Skip to content

Conversation

@raulcd
Copy link

@raulcd raulcd commented Jan 15, 2026

@WillAyd this is a PR against your branch. I thought it would be better to show you first. 👍 first! 😄

I have tested this locally with:

ARCHERY_DEBUG=1 PYARROW_VERSION=23.0.0.dev362 archery docker run python-sdist

The meson size seems 15% bigger than the current one, we might want to investigate if we are including something unnecessary afterwards. I've done a diffoscope between a sdist from main and the sdist from meson and the differences seem fine.

@github-actions
Copy link

❌ GitHub issue apache#36411 could not be retrieved.

Comment on lines +218 to +219
# Avoid building Pyarrow if it is a source distribution.
if not get_option('sdist')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only change in this file is this if and endif. The rest is just formatting

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can change arrow_dep = [] in the top level configuration to arrow_dep = disabler() when building an sdist; that should circumvent the need for having to constantly branch like this I think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, it fails with the same issue, I could add the changes on this specific file to the existing sdist option check (instead of having two separate checks):

diff --git a/python/meson.build b/python/meson.build
index c65a837868..f4622e09af 100644
--- a/python/meson.build
+++ b/python/meson.build
@@ -18,7 +18,6 @@
 project(
     'pyarrow',
     'cython',
-    'cpp',
     version: run_command(
         'python',
         '-m',
@@ -51,8 +50,10 @@ endif
 
 # https://github.com/mesonbuild/meson-python/issues/647
 if get_option('sdist')
-    arrow_dep = []
+    arrow_dep = disabler()
 else
+    add_languages('cpp', native: false)
+    cc = meson.get_compiler('cpp')
     arrow_dep = dependency(
         'arrow',
         'Arrow',
@@ -86,6 +87,4 @@ print(incdir)
     numpy_dep = declare_dependency(include_directories: incdir_numpy)
 endif
 
-cc = meson.get_compiler('cpp')
-
 subdir('pyarrow')

but the change to avoid building cython (python/pyarrow/meson.build) are also still necessary.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. No strong preference here - I think its a pretty minor consideration. Happy to keep as is

@WillAyd
Copy link
Owner

WillAyd commented Jan 15, 2026

Nice thanks!

The meson size seems 15% bigger than the current one, we might want to investigate if we are including something unnecessary afterwards. I've done a diffoscope between a sdist from main and the sdist from meson and the differences seem fine.

You are referring to the sdist right? And you are saying that you've diffed the files before/after and they are the same?

@raulcd
Copy link
Author

raulcd commented Jan 15, 2026

Yes, I've diffed the sdists there are some differences, for example we must remove possible python cached directories and some other minor things but I'll validate both sdist and wheels exhaustively once I fix the wheels too.

@WillAyd
Copy link
Owner

WillAyd commented Jan 15, 2026

Ah nice catch. Although I have to say I'm surprised meson-python wouldn't take care of excluding the cache

@raulcd raulcd force-pushed the raulcd-use-meson-python branch from f098f4d to 7a410f7 Compare January 20, 2026 09:02
delocate-listdeps ${source_dir}/python/dist/*.whl

echo "=== (${PYTHON_VERSION}) Bundle shared libraries into wheel ==="
delocate-wheel -w ${source_dir}/python/repaired_wheels -v ${source_dir}/python/dist/*.whl
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for macOS but we don't need repairwheel on linux huh? That seems a bit odd.

I ran into this a lot with Windows and it was called out during reviews, so worth double checking the need for this before going back

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed there seems to be something wrong with the manylinux wheels too. We should test those wheels probably on a clean environment where Arrow hasn't been built (and can't be found).
I can't seem to find the libarrow shared libraries on the manylinux wheels but they are included on the existing wheels, I looked at the latest uploaded to PyPI.
I started investigating the following conversations:

And from what I can see there it seems like "this shouldn't be meson's concern" but should be tackled with the auditwheel, delocate, delvewheel tools.
With CMake we had thePYARROW_BUNDLE_ARROW_CPP flag and the bundle_arrow_lib function managing that, maybe we should ask on your PR.
BTW I am triggering CI jobs on a WIP PR: apache#48882 because I don't want to "polute" your PR.

Copy link
Owner

@WillAyd WillAyd Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that's correct - for building and distributing to another system we need to use any of those tools.

However, building and running on the same system, I would think the "system" libraries would still be locatable at runtime.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (as you've already found) Meson really doesn't want to include this in its scope, likely because of all the differences across platforms.

With CMake we had thePYARROW_BUNDLE_ARROW_CPP flag and the bundle_arrow_lib function managing that, maybe we should ask on your PR.

I admittedly struggle to read CMake so I might be overlooking, but this function seems to just place the existing arrow library into the wheel without any name mangling or patching of existing libraries right? If so, I think the pushback we would get is that's anincomplete solution that can easily lead to conflicts with the system, particularly depending on how runtime libraries have already been loaded. I'm also not sure how well that works on Windows.

In either case, I'm no expert, so may not hurt to ask again :-)

Copy link
Author

@raulcd raulcd Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to have all wheels green on my PR and validate the contents are the expected ones. I'll manually check against published 23.0.0 wheels. Once this is done I'll share the approach and ask about feedback. There's also something I want to investigate, it feels slightly complex/verbose to have to declare all the -Csetup-args="-Dvar=${PYARROW_WITH_VAR}" \ and wondering about the developer experience when working locally. I'll see if I can investigate that too after I fix the Windows wheels.
Thanks for all the work you did here!

Copy link
Owner

@WillAyd WillAyd Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering about the developer experience when working locally.

I usually do -Csetup-args="-Dauto_features=enabled" to enable everything, and then explicitly disable things that aren't available on my system (ex: -Csetup-args="-Dcuda=disabled")

The whole -Csetup-args=... I find to be a poor experience, particurlarly since you have to repeat that pattern for every setting. I believe that limitation is enforced by pip:

https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-C

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done more digging on the manylinux side and we are already calling auditwheel repair that's why it wasn't failing. Before auditwheel repair using auditwheel -v show on the original wheel:

2026-01-21T11:34:59.1785416Z The following external shared libraries are required by the wheel:
2026-01-21T11:34:59.1785484Z {
2026-01-21T11:34:59.1785654Z     "libarrow.so.2400": "/tmp/arrow-dist/lib/libarrow.so.2400.0.0",
2026-01-21T11:34:59.1785843Z     "libarrow_acero.so.2400": "/tmp/arrow-dist/lib/libarrow_acero.so.2400.0.0",
2026-01-21T11:34:59.1786049Z     "libarrow_compute.so.2400": "/tmp/arrow-dist/lib/libarrow_compute.so.2400.0.0",
2026-01-21T11:34:59.1786238Z     "libarrow_dataset.so.2400": "/tmp/arrow-dist/lib/libarrow_dataset.so.2400.0.0",
2026-01-21T11:34:59.1786427Z     "libarrow_flight.so.2400": "/tmp/arrow-dist/lib/libarrow_flight.so.2400.0.0",
2026-01-21T11:34:59.1786645Z     "libarrow_substrait.so.2400": "/tmp/arrow-dist/lib/libarrow_substrait.so.2400.0.0",
2026-01-21T11:34:59.1786812Z     "libparquet.so.2400": "/tmp/arrow-dist/lib/libparquet.so.2400.0.0"
2026-01-21T11:34:59.1786880Z }

After auditwheel repair using auditwheel -v show on the repaired wheel:

2026-01-21T11:35:23.1142034Z             "libparquet-84e0c9ca.so.2400.0.0": {
2026-01-21T11:35:23.1142407Z                 "soname": "libparquet-84e0c9ca.so.2400.0.0",
2026-01-21T11:35:23.1142633Z                 "path": "/tmp/tmps3l07x0n/pyarrow.libs/libparquet-84e0c9ca.so.2400.0.0",
2026-01-21T11:35:23.1142853Z                 "realpath": "/tmp/tmps3l07x0n/pyarrow.libs/libparquet-84e0c9ca.so.2400.0.0",
2026-01-21T11:35:23.1142929Z                 "platform": {
2026-01-21T11:35:23.1143027Z                     "_elf_osabi": "ELFOSABI_SYSV",
2026-01-21T11:35:23.1143106Z                     "_elf_class": 64,
2026-01-21T11:35:23.1143194Z                     "_elf_little_endian": true,
2026-01-21T11:35:23.1143283Z                     "_elf_machine": "EM_X86_64",
2026-01-21T11:35:23.1143403Z                     "_base_arch": "<Architecture.x86_64: 'x86_64'>",
2026-01-21T11:35:23.1143485Z                     "_ext_arch": null,
2026-01-21T11:35:23.1143562Z                     "_error_msg": null
2026-01-21T11:35:23.1143624Z                 },
2026-01-21T11:35:23.1143697Z                 "needed": [
2026-01-21T11:35:23.1143798Z                     "libarrow-85ba3dab.so.2400.0.0",
2026-01-21T11:35:23.1143877Z                     "libdl.so.2",
2026-01-21T11:35:23.1143956Z                     "libpthread.so.0",
2026-01-21T11:35:23.1144031Z                     "libstdc++.so.6",
2026-01-21T11:35:23.1144108Z                     "libm.so.6",
2026-01-21T11:35:23.1144181Z                     "libgcc_s.so.1",
2026-01-21T11:35:23.1144250Z                     "libc.so.6",
2026-01-21T11:35:23.1144340Z                     "ld-linux-x86-64.so.2"
2026-01-21T11:35:23.1144400Z                 ]
2026-01-21T11:35:23.1144464Z             },

I initially thought we were missing the libraries because they are on a different location auditwheel pushes the required shared libraries to pyarrow.libs but bundling with CMake was copying them (installing under pyarrow subfolder) and then we were linking against those and building them in the wheel.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice find. I wonder if we need to update get_library_dirs for the auditwheel-fixed wheels then to avoid test failures? At a glance, I don't see that as adding pyarrow.libs to the return value (I had to do a similar fix for Windows with delvewheel)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively there might be a setting for the 3 tools to control where the libraries are getting bundled relative to the package root

@raulcd raulcd force-pushed the raulcd-use-meson-python branch from 4e69467 to a1a57d8 Compare January 22, 2026 12:50
-DARROW_WITH_ZLIB=%ARROW_WITH_ZLIB% ^
-DARROW_WITH_ZSTD=%ARROW_WITH_ZSTD% ^
-DCMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE% ^
-DCMAKE_INSTALL_PREFIX=C:\arrow-dist ^
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to the logs for this one? I'm surprised you'd have to do anything here - does this not get executed within a conda environment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use vcpkg for the dependencies on our Windows wheels not conda. This is the previous failure:
https://github.com/ursacomputing/crossbow/actions/runs/21254041726/job/61163356938

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That's interesting that CMake installs the ArrowComputeConfig.cmake file but then can't find it on a search with the default path.

Checking the logs I see that ArrowComputeConfig.cmake gets installed to C:/Program Files/arrow/lib/cmake/ArrowCompute/ArrowComputeConfig.cmake The CMake Docs suggest that a search location of <prefix>/(lib/<arch>|lib*|share)/cmake/<name>*/ is a Unix convention, so I wonder if its a bug in the Arrow CMake config to install that there in the first place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, there has been some trial and error. The initial failure was on:
https://github.com/ursacomputing/crossbow/actions/runs/21252214504/job/61156610086

2026-01-22T15:05:10.1086958Z   -- Installing: C:/arrow-dist/lib/cmake/ArrowCompute/ArrowComputeConfig.cmake

We use C:/arrow-dist nowadays. That log I sent you is when I tried installing on system

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests (per each commit) can be found on this PR and the archery jobs triggered:
apache#48882

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From their docs, I think the correct cross-platform path for it should be:

<prefix>/ArrowCompute/lib/cmake/ArrowCompute/ArrowComputeConfig.cmake

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! I see what you mean now. Sorry wasn't understanding previously. I understand what you mean, at first glance, yes that seems like a possible bug on our installation but I am not sure why Arrow is found then as it is installed following the same convention: Installing: C:/arrow-dist/lib/cmake/Arrow/ArrowConfig.cmake

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's Friday and I'm just looking at too many CI failures, as Arrow was found when I installed on system C:/Program Files/arrow/lib/cmake/Arrow/ArrowConfig.cmake but not on C:/arrow-dist/lib/cmake/Arrow/ArrowConfig.cmake.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest log you sent it looks like it fails to find Arrow:

2026-01-22T15:05:27.9376450Z Run-time dependency arrow found: NO (tried pkgconfig and cmake)
2026-01-22T15:05:27.9376779Z Run-time dependency arrow found: NO (tried cmake)
2026-01-22T15:05:27.9376982Z 
2026-01-22T15:05:27.9377373Z ..\meson.build:60:16: ERROR: Dependency lookup for Arrow with method 'pkgconfig' failed: Pkg-config for machine host machine not found. Giving up.

But I do see it as found in the first. Strange...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...actually in the first CI logs where Arrow is found it does follow the cross-platform installation pattern:

Installing: C:/Program Files/arrow/lib/cmake/Arrow/ArrowConfig.cmake

I'm assuming the prefix is C:/Program Files and Windows probably doesn't care about the case sensitivity of arrow versus Arrow.

The installation path in the second log does not include the project name in the path, hence why it doesn't find Arrow at all there, I think:

Installing: C:/arrow-dist/lib/cmake/Arrow/ArrowConfig.cmake

CMake is fun :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants