Skip to content

Commit f5ac05c

Browse files
GH-41105: [Python][Docs] Update PyArrow installation docs for conda package split (#41135)
Do not merge until some discussion is had about how to time this relative to conda-forge/arrow-cpp-feedstock#1376. Additionally, consider hot-patching this into arrow-site if appropriate. ### What changes are included in this PR? Updates to the [Python installation docs](https://arrow.apache.org/docs/python/install.html) to reflect the in-progress change splitting PyArrow on conda-forge into three separate packages. Specifically: 1. Add a note in the conda section highlighting that there are three packages and linking to a new section (2) in order to provide more information 2. Add a new section, linked from (1), providing a comparison of each package as a table ### Are these changes tested? These are just docs changes. I have built them locally and they look fine. ### Are there any user-facing changes? Just docs. * GitHub Issue: #41105 Lead-authored-by: Bryce Mecum <petridish@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Bryce Mecum <petridish@gmail.com>
1 parent 74f7578 commit f5ac05c

File tree

2 files changed

+90
-0
lines changed

2 files changed

+90
-0
lines changed

docs/source/python/flight.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
1818
.. currentmodule:: pyarrow.flight
1919
.. highlight:: python
20+
.. _flight:
2021

2122
================
2223
Arrow Flight RPC

docs/source/python/install.rst

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,13 @@ Install the latest version of PyArrow from
3939
4040
conda install -c conda-forge pyarrow
4141
42+
.. note::
43+
44+
While the ``pyarrow`` `conda-forge <https://conda-forge.org/>`_ package is
45+
the right choice for most users, both a minimal and maximal variant of the
46+
package exist, either of which may be better for your use case. See
47+
:ref:`python-conda-differences`.
48+
4249
Using Pip
4350
---------
4451

@@ -93,3 +100,85 @@ a custom path to the database from Python:
93100
94101
>>> import pyarrow as pa
95102
>>> pa.set_timezone_db_path("custom_path")
103+
104+
105+
.. _python-conda-differences:
106+
107+
Differences between conda-forge packages
108+
----------------------------------------
109+
110+
On `conda-forge <https://conda-forge.org/>`_, PyArrow is published as three
111+
separate packages, each providing varying levels of functionality. This is in
112+
contrast to PyPi, where only a single PyArrow package is provided.
113+
114+
The purpose of this split is to minimize the size of the installed package for
115+
most users (``pyarrow``), provide a smaller, minimal package for specialized use
116+
cases (``pyarrow-core``), while still providing a complete package for users who
117+
require it (``pyarrow-all``). What was historically ``pyarrow`` on
118+
`conda-forge <https://conda-forge.org/>`_ is now ``pyarrow-all``, though most
119+
users can continue using ``pyarrow``.
120+
121+
The ``pyarrow-core`` package includes the following functionality:
122+
123+
- :ref:`data`
124+
- :ref:`compute` (i.e., ``pyarrow.compute``)
125+
- :ref:`io`
126+
- :ref:`ipc` (i.e., ``pyarrow.ipc``)
127+
- :ref:`filesystem` (i.e., ``pyarrow.fs``. Note: It's planned to move cloud fileystems (i.e., :ref:`S3<filesystem-s3>`, :ref:`GCS<filesystem-gcs>`, etc) into ``pyarrow`` in a future release though :ref:`filesystem-localfs` will remain in ``pyarrow-core``.)
128+
- File formats: :ref:`Arrow/Feather<feather>`, :ref:`JSON<json>`, :ref:`CSV<py-csv>`, :ref:`ORC<orc>` (but not Parquet)
129+
130+
The ``pyarrow`` package adds the following:
131+
132+
- Acero (i.e., ``pyarrow.acero``)
133+
- :ref:`dataset` (i.e., ``pyarrow.dataset``)
134+
- :ref:`Parquet<parquet>` (i.e., ``pyarrow.parquet``)
135+
- Substrait (i.e., ``pyarrow.substrait``)
136+
137+
Finally, ``pyarrow-all`` adds:
138+
139+
- :ref:`flight` and Flight SQL (i.e., ``pyarrow.flight``)
140+
- Gandiva (i.e., ``pyarrow.gandiva``)
141+
142+
The following table lists the functionality provided by each package and may be
143+
useful when deciding to use one package over another or when
144+
:ref:`python-conda-custom-selection`.
145+
146+
+------------+---------------------+--------------+---------+-------------+
147+
| Component | Package | pyarrow-core | pyarrow | pyarrow-all |
148+
+------------+---------------------+--------------+---------+-------------+
149+
| Core | pyarrow-core ||||
150+
+------------+---------------------+--------------+---------+-------------+
151+
| Parquet | libparquet | |||
152+
+------------+---------------------+--------------+---------+-------------+
153+
| Dataset | libarrow-dataset | |||
154+
+------------+---------------------+--------------+---------+-------------+
155+
| Acero | libarrow-acero | |||
156+
+------------+---------------------+--------------+---------+-------------+
157+
| Substrait | libarrow-substrait | |||
158+
+------------+---------------------+--------------+---------+-------------+
159+
| Flight | libarrow-flight | | ||
160+
+------------+---------------------+--------------+---------+-------------+
161+
| Flight SQL | libarrow-flight-sql | | ||
162+
+------------+---------------------+--------------+---------+-------------+
163+
| Gandiva | libarrow-gandiva | | ||
164+
+------------+---------------------+--------------+---------+-------------+
165+
166+
.. _python-conda-custom-selection:
167+
168+
Creating A Custom Selection
169+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
170+
171+
If you know which components you need and want to control what's installed, you
172+
can create a custom selection of packages to include only the extra features you
173+
need. For example, to install ``pyarrow-core`` and add support for reading and
174+
writing Parquet, install ``libparquet`` alongside ``pyarrow-core``:
175+
176+
.. code-block:: shell
177+
178+
conda install -c conda-forge pyarrow-core libparquet
179+
180+
Or if you wish to use ``pyarrow`` but need support for Flight RPC:
181+
182+
.. code-block:: shell
183+
184+
conda install -c conda-forge pyarrow libarrow-flight

0 commit comments

Comments
 (0)