You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Processed 10000000 rows using 10 partitions in 0.038s
129
-
130
-
This example demonstrates nearly 3x performance improvement (0.107s vs 0.038s) when using
131
-
10 partitions instead of 1, showcasing how proper partitioning can significantly improve
132
-
CPU utilization and query performance.
133
-
134
-
The script demonstrates several key optimization techniques:
135
-
136
-
1. **Higher target partition count**: Uses :code:`with_target_partitions()` to set the number of concurrent partitions
137
-
2. **Automatic repartitioning**: Enables repartitioning for joins, aggregations, and window functions
138
-
3. **Manual repartitioning**: Uses :code:`repartition()` to ensure all partitions are utilized
139
-
4. **CPU-intensive operations**: Performs aggregations that can benefit from parallelization
140
-
141
-
The benchmark creates synthetic data and measures the time taken to perform a sum aggregation
142
-
across the specified number of partitions. This helps you understand how partition configuration
143
-
affects performance on your specific hardware.
144
-
145
-
For more information about available :py:class:`~datafusion.context.SessionConfig` options, see the `rust DataFusion Configuration guide <https://arrow.apache.org/datafusion/user-guide/configs.html>`_,
50
+
You can read more about available :py:class:`~datafusion.context.SessionConfig` options in the `rust DataFusion Configuration guide <https://arrow.apache.org/datafusion/user-guide/configs.html>`_,
146
51
and about :code:`RuntimeEnvBuilder` options in the rust `online API documentation <https://docs.rs/datafusion/latest/datafusion/execution/runtime_env/struct.RuntimeEnvBuilder.html>`_.
0 commit comments