Revert "revert branch UNPICK"

kosiew · kosiew · commit e8048c7e22be · 2025-08-31T15:51:12.000+08:00
This reverts commit c6d2da1.
diff --git a/README.md b/README.md
@@ -42,6 +42,10 @@ DataFusion's Python bindings can be used as a foundation for building new data s
 - Serialize and deserialize query plans in Substrait format.
 - Experimental support for transpiling SQL queries to DataFrame calls with Polars, Pandas, and cuDF.
 
+For tips on tuning parallelism, see
+[Maximizing CPU Usage](docs/source/user-guide/configuration.rst#maximizing-cpu-usage)
+in the configuration guide.
+
 ## Example Usage
 
 The following example demonstrates running a SQL query against a Parquet file using DataFusion, storing the results
diff --git a/benchmarks/max_cpu_usage.py b/benchmarks/max_cpu_usage.py
@@ -0,0 +1,76 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Benchmark script showing how to maximize CPU usage."""
+
+from __future__ import annotations
+
+import argparse
+import multiprocessing
+import time
+
+import pyarrow as pa
+from datafusion import SessionConfig, SessionContext, col
+from datafusion import functions as f
+
+
+def main(num_rows: int, partitions: int) -> None:
+    """Run a simple aggregation after repartitioning."""
+    # Create some example data
+    array = pa.array(range(num_rows))
+    batch = pa.record_batch([array], names=["a"])
+
+    # Configure the session to use a higher target partition count and
+    # enable automatic repartitioning.
+    config = (
+        SessionConfig()
+        .with_target_partitions(partitions)
+        .with_repartition_joins(enabled=True)
+        .with_repartition_aggregations(enabled=True)
+        .with_repartition_windows(enabled=True)
+    )
+    ctx = SessionContext(config)
+
+    # Register the input data and repartition manually to ensure that all
+    # partitions are used.
+    df = ctx.create_dataframe([[batch]]).repartition(partitions)
+
+    start = time.time()
+    df = df.aggregate([], [f.sum(col("a"))])
+    df.collect()
+    end = time.time()
+
+    print(
+        f"Processed {num_rows} rows using {partitions} partitions in {end - start:.3f}s"
+    )
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--rows",
+        type=int,
+        default=1_000_000,
+        help="Number of rows in the generated dataset",
+    )
+    parser.add_argument(
+        "--partitions",
+        type=int,
+        default=multiprocessing.cpu_count(),
+        help="Target number of partitions to use",
+    )
+    args = parser.parse_args()
+    main(args.rows, args.partitions)
diff --git a/docs/source/user-guide/configuration.rst b/docs/source/user-guide/configuration.rst
@@ -46,6 +46,101 @@ a :py:class:`~datafusion.context.SessionConfig` and :py:class:`~datafusion.conte
     ctx = SessionContext(config, runtime)
     print(ctx)
 
+Maximizing CPU Usage
+--------------------
 
-You can read more about available :py:class:`~datafusion.context.SessionConfig` options in the `rust DataFusion Configuration guide <https://arrow.apache.org/datafusion/user-guide/configs.html>`_,
+DataFusion uses partitions to parallelize work. For small queries the
+default configuration (number of CPU cores) is often sufficient, but to
+fully utilize available hardware you can tune how many partitions are
+created and when DataFusion will repartition data automatically.
+
+Configure a ``SessionContext`` with a higher partition count:
+
+.. code-block:: python
+
+    from datafusion import SessionConfig, SessionContext
+
+    # allow up to 16 concurrent partitions
+    config = SessionConfig().with_target_partitions(16)
+    ctx = SessionContext(config)
+
+Automatic repartitioning for joins, aggregations, window functions and
+other operations can be enabled to increase parallelism:
+
+.. code-block:: python
+
+    config = (
+        SessionConfig()
+        .with_target_partitions(16)
+        .with_repartition_joins(True)
+        .with_repartition_aggregations(True)
+        .with_repartition_windows(True)
+    )
+
+Manual repartitioning is available on DataFrames when you need precise
+control:
+
+.. code-block:: python
+
+    from datafusion import col
+
+    df = ctx.read_parquet("data.parquet")
+
+    # Evenly divide into 16 partitions
+    df = df.repartition(16)
+
+    # Or partition by the hash of a column
+    df = df.repartition_by_hash(col("a"), num=16)
+
+    result = df.collect()
+
+
+Benchmark Example
+^^^^^^^^^^^^^^^^^
+
+The repository includes a benchmark script that demonstrates how to maximize CPU usage
+with DataFusion. The :code:`benchmarks/max_cpu_usage.py` script shows a practical example
+of configuring DataFusion for optimal parallelism.
+
+You can run the benchmark script to see the impact of different configuration settings:
+
+.. code-block:: bash
+
+    # Run with default settings (uses all CPU cores)
+    python benchmarks/max_cpu_usage.py
+
+    # Run with specific number of rows and partitions
+    python benchmarks/max_cpu_usage.py --rows 5000000 --partitions 16
+
+    # See all available options
+    python benchmarks/max_cpu_usage.py --help
+
+Here's an example showing the performance difference between single and multiple partitions:
+
+.. code-block:: bash
+
+    # Single partition - slower processing
+    $ python benchmarks/max_cpu_usage.py --rows=10000000 --partitions 1
+    Processed 10000000 rows using 1 partitions in 0.107s
+
+    # Multiple partitions - faster processing
+    $ python benchmarks/max_cpu_usage.py --rows=10000000 --partitions 10
+    Processed 10000000 rows using 10 partitions in 0.038s
+
+This example demonstrates nearly 3x performance improvement (0.107s vs 0.038s) when using 
+10 partitions instead of 1, showcasing how proper partitioning can significantly improve 
+CPU utilization and query performance.
+
+The script demonstrates several key optimization techniques:
+
+1. **Higher target partition count**: Uses :code:`with_target_partitions()` to set the number of concurrent partitions
+2. **Automatic repartitioning**: Enables repartitioning for joins, aggregations, and window functions
+3. **Manual repartitioning**: Uses :code:`repartition()` to ensure all partitions are utilized
+4. **CPU-intensive operations**: Performs aggregations that can benefit from parallelization
+
+The benchmark creates synthetic data and measures the time taken to perform a sum aggregation
+across the specified number of partitions. This helps you understand how partition configuration
+affects performance on your specific hardware.
+
+For more information about available :py:class:`~datafusion.context.SessionConfig` options, see the `rust DataFusion Configuration guide <https://arrow.apache.org/datafusion/user-guide/configs.html>`_,
 and about :code:`RuntimeEnvBuilder` options in the rust `online API documentation <https://docs.rs/datafusion/latest/datafusion/execution/runtime_env/struct.RuntimeEnvBuilder.html>`_.