Releases: its-spark-dev/pydre-parallelism-benchmark
Releases · its-spark-dev/pydre-parallelism-benchmark
v1.0.0 — Initial Public Benchmark Release
This is the initial public release of the pydre parallelism benchmark, a workload-aware study of parallel execution strategies in real-world analytics pipelines.
Highlights
- Systematic benchmark of sequential, threading, and multiprocessing execution modes
- Four workload profiles: light, medium, heavy, ROI-heavy
- Empirical evaluation of worker scaling behavior and CPU utilization
- Demonstrates diminishing returns of thread-based parallelism beyond moderate worker counts
- Introduces a dynamic worker allocation strategy (~75% of logical CPUs) as a stable, near-optimal default
Key Takeaway
Parallelism is not a free performance boost. Effective parallel execution depends on aligning the execution model with workload characteristics and overhead behavior.
Included
- Benchmark runner and analysis scripts
- Workload configuration profiles (
.toml) - Aggregated results and visualizations
- Full technical report (PDF) detailing methodology, results, and conclusions
Data Policy
Large-scale experimental input data is intentionally excluded.
The benchmark pipeline is fully reusable with user-generated or domain-specific pydre-compatible datasets.
This release establishes a reproducible baseline for evaluating parallel execution strategies in pydre-based analytics workflows.