Releases · its-spark-dev/pydre-parallelism-benchmark

This is the initial public release of the pydre parallelism benchmark, a workload-aware study of parallel execution strategies in real-world analytics pipelines.

Highlights

Systematic benchmark of sequential, threading, and multiprocessing execution modes
Four workload profiles: light, medium, heavy, ROI-heavy
Empirical evaluation of worker scaling behavior and CPU utilization
Demonstrates diminishing returns of thread-based parallelism beyond moderate worker counts
Introduces a dynamic worker allocation strategy (~75% of logical CPUs) as a stable, near-optimal default

Key Takeaway

Parallelism is not a free performance boost. Effective parallel execution depends on aligning the execution model with workload characteristics and overhead behavior.

Included

Benchmark runner and analysis scripts
Workload configuration profiles (.toml)
Aggregated results and visualizations
Full technical report (PDF) detailing methodology, results, and conclusions

Data Policy

Large-scale experimental input data is intentionally excluded.
The benchmark pipeline is fully reusable with user-generated or domain-specific pydre-compatible datasets.

This release establishes a reproducible baseline for evaluating parallel execution strategies in pydre-based analytics workflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Key Takeaway

Included

Data Policy

Uh oh!

Releases: its-spark-dev/pydre-parallelism-benchmark

v1.0.0 — Initial Public Benchmark Release

Highlights

Key Takeaway

Included

Data Policy

Uh oh!