diff --git a/README.md b/README.md index 120504b882e..05b76ecf091 100644 --- a/README.md +++ b/README.md @@ -48,7 +48,7 @@ In the GIFs below, Modin (left) and pandas (right) perform *the same pandas oper -The charts below show the speedup you get by replacing pandas with Modin based on the examples above. The example notebooks can be found [here](examples/jupyter). To learn more about the speedups you could get with Modin and try out some examples on your own, check out our [10-minute quickstart guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html) to try out some examples on your own! +The charts below show the speedup you get by replacing pandas with Modin based on the examples above. The example notebooks can be found [here](examples/jupyter). To learn more about the speedups you could get with Modin and try out some examples on your own, check out our [10-minute quickstart guide](https://modin.readthedocs.io/en/latest/getting_started/quickstart.html). @@ -56,7 +56,7 @@ The charts below show the speedup you get by replacing pandas with Modin based o #### From PyPI -Modin can be installed with `pip` on Linux, Windows and MacOS: +Modin can be installed with `pip` on Linux, Windows and macOS: ```bash pip install "modin[all]" # (Recommended) Install Modin with Ray and Dask engines. @@ -84,7 +84,7 @@ Modin automatically detects which engine(s) you have installed and uses that for #### From conda-forge -Installing from [conda forge](https://github.com/conda-forge/modin-feedstock) using `modin-all` +Installing from [conda-forge](https://github.com/conda-forge/modin-feedstock) using `modin-all` will install Modin and three engines: [Ray](https://github.com/ray-project/ray), [Dask](https://github.com/dask/dask) and [MPI through unidist](https://github.com/modin-project/unidist). @@ -114,7 +114,7 @@ To speed up conda installation we recommend using libmamba solver. To do this in conda install -n base conda-libmamba-solver ``` -and then use it during istallation either like: +and then use it during installation either like: ```bash conda install -c conda-forge modin-ray --experimental-solver=libmamba @@ -161,7 +161,7 @@ _Note: You should not change the engine after your first operation with Modin as #### Which engine should I use? -On Linux, MacOS, and Windows you can install and use either Ray, Dask or MPI through unidist. There is no knowledge required +On Linux, macOS, and Windows you can install and use either Ray, Dask or MPI through unidist. There is no knowledge required to use either of these engines as Modin abstracts away all of the complexity, so feel free to pick either! diff --git a/docs/flow/modin/core/dataframe/pandas/partitioning/partition_manager.rst b/docs/flow/modin/core/dataframe/pandas/partitioning/partition_manager.rst index 09bccecb189..19237ed6afa 100644 --- a/docs/flow/modin/core/dataframe/pandas/partitioning/partition_manager.rst +++ b/docs/flow/modin/core/dataframe/pandas/partitioning/partition_manager.rst @@ -23,11 +23,14 @@ Partition manager can apply user-passed (arbitrary) function in different modes: * along a full axis (apply a function to an entire column or row made up of block partitions when user function needs information about the whole axis) -It can also broadcast partitions from `right` to `left` when executing certain operations making -`right` partitions available for functions executed where `left` live. - -.. - TODO: insert more text explaining "broadcast" term +It can also broadcast partitions from `right` to `left` when executing certain operations, +making `right` partitions available for functions executed where `left` live. + +In this context, "broadcast" means replicating and aligning the partitions of the `right` +operand across the appropriate axis of the `left` operand so that a partition-wise function +can be applied. If the two operands have different partitioning along the index/columns, +Modin will align them first, which may require repartitioning and introduce extra cost. +See also the discussion of the Binary operator in :doc:`Core Dataframe Algebra `. Partition manager also is used to create "logical" partitions, or :doc:`axis partitions ` by joining existing partitions along specified axis (either rows or labels), diff --git a/docs/flow/modin/core/execution/dispatching.rst b/docs/flow/modin/core/execution/dispatching.rst index 507bb21d0e3..3e0c5f2f6e0 100644 --- a/docs/flow/modin/core/execution/dispatching.rst +++ b/docs/flow/modin/core/execution/dispatching.rst @@ -8,7 +8,7 @@ Factories Module Description Brief description ''''''''''''''''' -Modin has several execution engines and storage formats, combining them together forms certain executions.  +Modin has several execution engines and storage formats; combining them forms certain executions. Calling any :py:class:`~modin.pandas.dataframe.DataFrame` API function will end up in some execution-specific method. The responsibility of dispatching high-level API calls to execution-specific function belongs to the :ref:`QueryCompiler `, which is determined at the time of the dataframe's creation by the factory of the corresponding execution. The mission of this module is to route IO function calls from diff --git a/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/index.rst b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/index.rst index 17249090cb4..b13a22ce1d7 100644 --- a/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/index.rst +++ b/docs/flow/modin/core/execution/ray/implementations/pandas_on_ray/index.rst @@ -27,7 +27,7 @@ the :py:class:`~modin.core.execution.ray.implementations.pandas_on_ray.dataframe generic functionality from the ``GenericRayDataframe`` and the :py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe`. .. - TODO: insert a link to ``GenericRayDataframe`` once we add an implementatiton of the class + TODO: insert a link to ``GenericRayDataframe`` once we add an implementation of the class PandasOnRay Dataframe implementation ------------------------------------ @@ -79,4 +79,4 @@ and a new query compiler with the data read is returned. When writing data to a CSV file, for example, the :py:class:`~modin.core.execution.ray.implementations.pandas_on_ray.io.PandasOnRayIO` processes the user query to execute it on Ray workers. Then, the :py:class:`~modin.core.execution.ray.implementations.pandas_on_ray.io.PandasOnRayIO` asks the :py:class:`~modin.core.execution.ray.implementations.pandas_on_ray.dataframe.PandasOnRayDataframe` to decompose the data into row-wise partitions -that will be written into the file in parallel in Ray workers. \ No newline at end of file +that will be written into the file in parallel in Ray workers. diff --git a/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst b/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst index 0a80865f376..e5624c94f38 100644 --- a/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst +++ b/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst @@ -28,7 +28,7 @@ the :py:class:`~modin.core.execution.unidist.implementations.pandas_on_unidist.d generic functionality from the ``GenericUnidistDataframe`` and the :py:class:`~modin.core.dataframe.pandas.dataframe.dataframe.PandasDataframe`. .. - TODO: insert a link to ``GenericUnidistDataframe`` once we add an implementatiton of the class + TODO: insert a link to ``GenericUnidistDataframe`` once we add an implementation of the class PandasOnUnidist Dataframe implementation ---------------------------------------- @@ -80,4 +80,4 @@ and a new query compiler with the data read is returned. When writing data to a CSV file, for example, the :py:class:`~modin.core.execution.unidist.implementations.pandas_on_unidist.io.PandasOnUnidistIO` processes the user query to execute it on Unidist workers. Then, the :py:class:`~modin.core.execution.unidist.implementations.pandas_on_unidist.io.PandasOnUnidistIO` asks the :py:class:`~modin.core.execution.unidist.implementations.pandas_on_unidist.dataframe.PandasOnUnidistDataframe` to decompose the data into row-wise partitions -that will be written into the file in parallel in Unidist workers. \ No newline at end of file +that will be written into the file in parallel in Unidist workers. diff --git a/docs/flow/modin/pandas/dataframe.rst b/docs/flow/modin/pandas/dataframe.rst index 3b74b54caf7..130290b4af7 100644 --- a/docs/flow/modin/pandas/dataframe.rst +++ b/docs/flow/modin/pandas/dataframe.rst @@ -12,9 +12,12 @@ rewritten into a representation that can be processed in parallel by the partiti results can be e.g., reduced to single output, identical to the single threaded pandas ``DataFrame`` method output. -.. - TODO: add link to the docs with detailed description of queries compilation - and execution ater DOCS-#2996 is merged. +.. note:: + For a detailed description of how Modin compiles and executes queries, see: + + - :doc:`Core Dataframe Algebra ` (operator patterns like Map/Reduce/Binary) + - :doc:`BaseQueryCompiler ` (common compiler interface) + - :doc:`PandasQueryCompiler ` (pandas storage format specifics) Public API ---------- diff --git a/docs/flow/modin/pandas/series.rst b/docs/flow/modin/pandas/series.rst index 7a0bd9a6094..6e6edbbadd7 100644 --- a/docs/flow/modin/pandas/series.rst +++ b/docs/flow/modin/pandas/series.rst @@ -12,9 +12,12 @@ into a representation that can be processed in parallel by the partitions. These results can be e.g., reduced to single output, identical to the single threaded pandas ``Series`` method output. -.. - TODO: add link to the docs with detailed description of queries compilation - and execution ater DOCS-#2996 is merged. +.. note:: + For a detailed description of how Modin compiles and executes queries, see: + + - :doc:`Core Dataframe Algebra ` (operator patterns like Map/Reduce/Binary) + - :doc:`BaseQueryCompiler ` (common compiler interface) + - :doc:`PandasQueryCompiler ` (pandas storage format specifics) Public API ---------- diff --git a/docs/getting_started/installation.rst b/docs/getting_started/installation.rst index 100e8b120dc..9ff1efc40d8 100644 --- a/docs/getting_started/installation.rst +++ b/docs/getting_started/installation.rst @@ -17,7 +17,7 @@ Installing with pip Stable version """""""""""""" -Modin can be installed with ``pip`` on Linux, Windows and MacOS. +Modin can be installed with ``pip`` on Linux, Windows and macOS. To install the most recent stable release run the following: .. code-block:: bash @@ -96,7 +96,7 @@ Modin can be used with Google Colab_ via the ``pip`` command, by running the fol !pip install "modin[all]" -Since Colab preloads several of Modin's dependencies by default, we need to restart the Colab environment once Modin is installed by either clicking on the :code:`"RESTART RUNTIME"` button in the installation output or by run the following code: +Since Colab preloads several of Modin's dependencies by default, we need to restart the Colab environment once Modin is installed by either clicking on the :code:`"RESTART RUNTIME"` button in the installation output or by running the following code: .. code-block:: python @@ -120,13 +120,13 @@ it is possible to install modin with chosen engine(s) alongside. Current options +---------------------------------+---------------------------+-----------------------------+ | **Package name in conda-forge** | **Engine(s)** | **Supported OSs** | +---------------------------------+---------------------------+-----------------------------+ -| modin | Dask_ | Linux, Windows, MacOS | +| modin | Dask_ | Linux, Windows, macOS | +---------------------------------+---------------------------+-----------------------------+ -| modin-dask | Dask | Linux, Windows, MacOS | +| modin-dask | Dask | Linux, Windows, macOS | +---------------------------------+---------------------------+-----------------------------+ | modin-ray | Ray_ | Linux, Windows | +---------------------------------+---------------------------+-----------------------------+ -| modin-mpi | MPI_ through unidist_ | Linux, Windows, MacOS | +| modin-mpi | MPI_ through unidist_ | Linux, Windows, macOS | +---------------------------------+---------------------------+-----------------------------+ | modin-all | Dask, Ray, Unidist | Linux | +---------------------------------+---------------------------+-----------------------------+ @@ -156,7 +156,7 @@ or explicitly: Refer to `Installing with conda`_ section of the unidist documentation for more details on how to install a specific MPI implementation to run on. -``conda`` may be slow installing ``modin-all`` or combitations of execution engines so we currently recommend using libmamba solver for the installation process. +``conda`` may be slow installing ``modin-all`` or combinations of execution engines, so we currently recommend using the libmamba solver for the installation process. To do this install it in a base environment: .. code-block:: bash @@ -167,7 +167,7 @@ Then it can be used during installation either like .. code-block:: bash - conda install -c conda-forge modin-ray modin- --experimental-solver=libmamba + conda install -c conda-forge modin-ray modin-dask modin-mpi --experimental-solver=libmamba or starting from conda 22.11 and libmamba solver 22.12 versions diff --git a/docs/release_notes/release_notes-0.16.0.rst b/docs/release_notes/release_notes-0.16.0.rst index 308c506b7bf..45aaecbd620 100644 --- a/docs/release_notes/release_notes-0.16.0.rst +++ b/docs/release_notes/release_notes-0.16.0.rst @@ -72,7 +72,7 @@ Key Features and Updates * PERF-#4773: Compute `lengths` and `widths` in `put` method of Dask partition like Ray do (#4780) * PERF-#4732: Avoid overwriting already-evaluated `PandasOnRayDataframePartition._length_cache` and `PandasOnRayDataframePartition._width_cache` (#4754) * PERF-#4862: Don't call `compute_sliced_len.remote` when `row_labels/col_labels == slice(None)` (#4863) - * PERF-#4713: Stop overriding the ray MacOS object store size limit (#4792) + * PERF-#4713: Stop overriding the Ray macOS object store size limit (#4792) * PERF-#4851: Compute `dtypes` for binary operations that can only return bool type and the right operand is not a Modin object (#4852) * PERF-#4842: `copy` should not trigger any previous computations (#4843) * PERF-#4849: Compute `dtypes` in `concat` also for ROW_WISE case when possible (#4850)