You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do not merge until some discussion is had about how to time this relative to conda-forge/arrow-cpp-feedstock#1376. Additionally, consider hot-patching this into arrow-site if appropriate.
### What changes are included in this PR?
Updates to the [Python installation docs](https://arrow.apache.org/docs/python/install.html) to reflect the in-progress change splitting PyArrow on conda-forge into three separate packages. Specifically:
1. Add a note in the conda section highlighting that there are three packages and linking to a new section (2) in order to provide more information
2. Add a new section, linked from (1), providing a comparison of each package as a table
### Are these changes tested?
These are just docs changes. I have built them locally and they look fine.
### Are there any user-facing changes?
Just docs.
* GitHub Issue: #41105
Lead-authored-by: Bryce Mecum <petridish@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Bryce Mecum <petridish@gmail.com>
Copy file name to clipboardExpand all lines: docs/source/python/install.rst
+89Lines changed: 89 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,6 +39,13 @@ Install the latest version of PyArrow from
39
39
40
40
conda install -c conda-forge pyarrow
41
41
42
+
.. note::
43
+
44
+
While the ``pyarrow`` `conda-forge <https://conda-forge.org/>`_ package is
45
+
the right choice for most users, both a minimal and maximal variant of the
46
+
package exist, either of which may be better for your use case. See
47
+
:ref:`python-conda-differences`.
48
+
42
49
Using Pip
43
50
---------
44
51
@@ -93,3 +100,85 @@ a custom path to the database from Python:
93
100
94
101
>>>import pyarrow as pa
95
102
>>> pa.set_timezone_db_path("custom_path")
103
+
104
+
105
+
.. _python-conda-differences:
106
+
107
+
Differences between conda-forge packages
108
+
----------------------------------------
109
+
110
+
On `conda-forge <https://conda-forge.org/>`_, PyArrow is published as three
111
+
separate packages, each providing varying levels of functionality. This is in
112
+
contrast to PyPi, where only a single PyArrow package is provided.
113
+
114
+
The purpose of this split is to minimize the size of the installed package for
115
+
most users (``pyarrow``), provide a smaller, minimal package for specialized use
116
+
cases (``pyarrow-core``), while still providing a complete package for users who
117
+
require it (``pyarrow-all``). What was historically ``pyarrow`` on
118
+
`conda-forge <https://conda-forge.org/>`_ is now ``pyarrow-all``, though most
119
+
users can continue using ``pyarrow``.
120
+
121
+
The ``pyarrow-core`` package includes the following functionality:
122
+
123
+
- :ref:`data`
124
+
- :ref:`compute` (i.e., ``pyarrow.compute``)
125
+
- :ref:`io`
126
+
- :ref:`ipc` (i.e., ``pyarrow.ipc``)
127
+
- :ref:`filesystem` (i.e., ``pyarrow.fs``. Note: It's planned to move cloud fileystems (i.e., :ref:`S3<filesystem-s3>`, :ref:`GCS<filesystem-gcs>`, etc) into ``pyarrow`` in a future release though :ref:`filesystem-localfs` will remain in ``pyarrow-core``.)
128
+
- File formats: :ref:`Arrow/Feather<feather>`, :ref:`JSON<json>`, :ref:`CSV<py-csv>`, :ref:`ORC<orc>` (but not Parquet)
0 commit comments