Skip to content

Conversation

@nascheme
Copy link
Member

@nascheme nascheme commented Feb 28, 2025

As described in the issue, using the unit tests as the PGO task has some problems. This PR creates a new PGO task by taking benchmarks from the "pyperformance" suite and adjusting them (slightly) to work better as a PGO task. I had a couple objectives when moving the benchmark code over:

  • use only benchmarks that don't have external dependencies
  • prefer benchmarks that exercise the most heavily used parts of Python
  • adjust code so that execution time is mostly spent doing "real work" rather than on timing related things
  • adjust the loops or iterations so that each task takes around 0.1 to 1 seconds, roughly

There are some potential issues in adding this new PGO task:

  • adding the additional code to the cpython repo is not ideal. It is approximately 500 kB of data added to the Lib/test/pgo_task folder. We could put the code in a separate repo or PyPI package but I think we prefer that Python can be compiled without additional external dependencies.
  • the code duplication between pyperformance and Lib/test/pgo_task is not ideal. OTOH, the pyperformance benchmarks don't see a lot of code changes (in order for benchmark results to be stable) and so keeping the code up-to-date should not be too much of a problem.
  • the current task set doesn't cover as much code as the unit test based PGO task did. I think this can be resolved over time by adding more modules to the pgo_task folder to cover those code paths.
  • It's possible that using the new PGO task will result in worse optimization of the Python executable, for real work-loads. However, I think if that happens we should adjust both the pyperformance and the pgo_tasks so those execution patterns are better represented. We are focusing a lot of optimization efforts based on what pyperformance results say and so those benchmarks should also be a good representation of real-world programs. If not, we should improve them.

Benchmark results from pyperformance.

@brandtbucher
Copy link
Member

I agree with the root issue, but I'm skeptical of using our benchmarks as a profiling task (it feels too much like "gaming" them).

As a data point, I tried using pyperformance as a PGO task a couple of years ago, and it made things about 4% faster (according to pyperformance, of course): faster-cpython/ideas#99 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants