ci: For python wheel generation, use ccache #4924

lgritz · 2025-10-10T19:47:17Z

Building the python wheels takes a long time! I really hate waiting for that when testing PRs, there must be a way to speed it up.

For the wheel workflow, add cache actions to save and restore the CCACHE_DIR. Hey, it's no easy feat to figure out what the path should be to that directory, especially on Linux where the wheels are built in a container, so the paths inside the container (where the wheel is built and the C++ compilation happens) don't match the paths outside the container (where the cache restore and save actions execute).
I had a heck of a time on Linux trying to get a pre-built ccache installed and had to resort to writing a bash script to build ccache itself from scratch, which is much more expensive than I'd like, but we'll have to come back to fix that separately.
Changed our "auto-build" utility build_dependency_with_cmake to print the amount of time it takes to build each dependency.
When auto-building, pass along CMAKE_CXX_COMPILER_LAUNCHER so that the dependenencies also for sure use ccache.
Use CMAKE_BUILD_PARALLEL_LEVEL on the wheel run to use all the cores and compile in parallel. (We did that on the regular CI but I think not for the wheel building.)
Fixes to the logic in our compiler.cmake where it tries to use ccache even if the magic CMAKE_CXX_COMPILER_LAUNCHER isn't set -- I have come to believe we were doing it wrong before, it was having no effect, and all along we only got ccache working on CI because we also set the env variable.
For CI, set CCACHE_COMPRESSION=1 to make the caches take less space against the precious limit of how much cache we can use total on GHA.

So, the result of all this:

Previous times (typical), and first wheel run for any git branch

platform	total	compile OIIO + deps
Linux Intel	10:35	500s
Linux ARM	7:20	294s
Mac Intel	20:18	1146s
Mac ARM	7:19	388s
Windows	14:00	759s

With ccache active, 2nd or more wheel run for a git branch

platform	total	compile OIIO + deps
Linux Intel	3:33	98s
Linux ARM	3:34	83s
Mac Intel	5:01	212s
Mac ARM	2:30	95s
Windows	N/A	not using ccache

The "compile OIIO + deps" column is the isolated time to build OIIO and also any auto-building of dependencies from source that we do. But it does not include any other setup, such as 40-60s of container setup on Linux, 20-40s setting up Python on Mac, or -- ick! -- 45-60 seconds to build cmake itself from scratch.

So this is considerably better, speeding up the full wheel workflow by 2-4x on all platforms but Windows. Remaining room for improvement (possibly in subsequent PRs, hopefully not necessarily by me):

Is there a way to use ccache on Windows? I'm not sure if there is, when using MSVS.
Find a way to install pre-built binaries on Linux rather than building ccache from scratch, which would save almost a whole minute per job.
Organize each platform to build all of its wheels (i.e. all of the python versions we are building for on that platform) in a single job rather than as completely independent jobs, allowing (a) the fixed per-job overhead such as container initialization, and installing python, and building of certain dependencies to happen ONCE per platform instead of separately for each wheel, and (b) the magic of ccache to speed up the builds across those wheels, since only a tiny amount of OIIO source codes depend on python at all.

zachlewis · 2025-10-10T20:17:52Z

This is remarkable -- nice work, I know this has been kind of an uphill battle! What you're doing makes a lot of sense. LGTM!

lgritz · 2025-10-10T20:23:45Z

I would also like to switch the approach to coalesce into one job per platform (doing all the wheels in sequence and amortizing the fixed costs), instead of a separate job from scratch for every individual wheel. I think that's possible, but I haven't looked into it and it's definitely beyond my current knowledge frontier about the wheel builder.

Building the python wheels takes a long time! I really hate waiting for that to get added to the CI, there must be a way to speed it up. * To wheel workflow, add cache actions to save and restore the CCACHE_DIR. Hey, it's no easy feat to figure out what the path should be to that directory, especially on Linux where the wheels are build in a container, so the paths in the container (where the wheel is built and the C++ compilation happens) don't match the paths outside the container (where the cache restore and save actions execute). * I had a heck of a time on Linux trying to get a pre-built ccache installed and had to resort to writing a bash script to build ccache itself from scratch. * Changed our "auto-build" utility build_dependency_with_cmake to print the amount of time it takes to build each dependency. * When auto-building, pass along CMAKE_CXX_COMPILER_LAUNCHER so that the dependenencies also for sure use ccache. * Use CMAKE_BUILD_PARALLEL_LEVEL on the wheel run to use all the cores and compile in parallel. (We did that on the regular CI but I think not for the wheel building.) * Fixes to the logic in our compiler.cmake where it tries to use ccache even if the magic CMAKE_CXX_COMPILER_LAUNCHER isn't set -- I now believe we were doing it wrong, it was having no effect, and all along we only got ccache working on CI because we *also* set the env variable. * For CI, set CCACHE_COMPRESSION=1 to make the caches take less space against the precious limit of how much cache we can use total on GHA. So, the result of all this: **Previous times (typical)** | platform | total | compile OIIO + deps | | ----------- | ----- | ------------------- | | Linux Intel | 10:35 | 500s | | Linux ARM | 7:20 | 294s | | Mac Intel | 20:18 | 1146s | | Mac ARM | 7:19 | 388s | | Windows | 14:00 | 759s | **With ccache active** | platform | total | compile OIIO + deps | | ----------- | ----- | ------------------- | | Linux Intel | 3:33 | 98s | | Linux ARM | 3:34 | 83s | | Mac Intel | 5:01 | 212s | | Mac ARM | 2:30 | 95s | | Windows | N/A | not using ccache | The "compile OIIO + deps" column is the isolated time to build OIIO and also any auto-building of dependencies from source that we do. But it does not include any other setup, such as 40-60s of container setup on Linux, 20-40s setting up Python on Mac, or -- ick! -- 45-60 seconds to build cmake itself from scratch. So this is considerably better, speeding up the full wheel workflow by 2-4x on all platforms but Windows. Remaining room for improvement: * Is there a way to use ccache on Windows? I'm not sure if there is, when using MSVS. * Find a way to install pre-built binaries on Linux rather than building ccache from scratch, which would save almost a whole minute per job. * Organize each platform to build all of its wheels (i.e. all of the python versions we are building for on that platform) in a single job rather than as completely independent jobs, allowing (a) the fixed per-job overhead such as container initialization, and installing python, and building of certain dependencies to happen ONCE per platform instead of separately for each wheel, and (b) the magic of ccache to speed up the builds *across* those wheels, since only a tiny amount of OIIO source codes depend on python at all. Signed-off-by: Larry Gritz <[email protected]>

zachlewis · 2025-10-10T20:30:28Z

Yep, noted! In theory, it should be super easy to do, now that ccache is working its magic...

lgritz · 2025-10-10T23:12:49Z

This is remarkable -- nice work, I know this has been kind of an uphill battle! What you're doing makes a lot of sense. LGTM!

Thanks!

If that "LGTM" is an actual review, can you please click the approval button?

…n#4924) Building the python wheels takes a long time! I really hate waiting for that when testing PRs, there must be a way to speed it up. * For the wheel workflow, add cache actions to save and restore the CCACHE_DIR. Hey, it's no easy feat to figure out what the path should be to that directory, especially on Linux where the wheels are built in a container, so the paths inside the container (where the wheel is built and the C++ compilation happens) don't match the paths outside the container (where the cache restore and save actions execute). * I had a heck of a time on Linux trying to get a pre-built ccache installed and had to resort to writing a bash script to build ccache itself from scratch, which is much more expensive than I'd like, but we'll have to come back to fix that separately. * Changed our "auto-build" utility build_dependency_with_cmake to print the amount of time it takes to build each dependency. * When auto-building, pass along CMAKE_CXX_COMPILER_LAUNCHER so that the dependenencies also for sure use ccache. * Use CMAKE_BUILD_PARALLEL_LEVEL on the wheel run to use all the cores and compile in parallel. (We did that on the regular CI but I think not for the wheel building.) * Fixes to the logic in our compiler.cmake where it tries to use ccache even if the magic CMAKE_CXX_COMPILER_LAUNCHER isn't set -- I have come to believe we were doing it wrong before, it was having no effect, and all along we only got ccache working on CI because we *also* set the env variable. * For CI, set CCACHE_COMPRESSION=1 to make the caches take less space against the precious limit of how much cache we can use total on GHA. So, the result of all this: **Previous times (typical), and first wheel run for any git branch** | platform | total | compile OIIO + deps | | ----------- | ----- | ------------------- | | Linux Intel | 10:35 | 500s | | Linux ARM | 7:20 | 294s | | Mac Intel | 20:18 | 1146s | | Mac ARM | 7:19 | 388s | | Windows | 14:00 | 759s | **With ccache active, 2nd or more wheel run for a git branch** | platform | total | compile OIIO + deps | | ----------- | ----- | ------------------- | | Linux Intel | 3:33 | 98s | | Linux ARM | 3:34 | 83s | | Mac Intel | 5:01 | 212s | | Mac ARM | 2:30 | 95s | | Windows | N/A | not using ccache | The "compile OIIO + deps" column is the isolated time to build OIIO and also any auto-building of dependencies from source that we do. But it does not include any other setup, such as 40-60s of container setup on Linux, 20-40s setting up Python on Mac, or -- ick! -- 45-60 seconds to build cmake itself from scratch. So this is considerably better, speeding up the full wheel workflow by 2-4x on all platforms but Windows. Remaining room for improvement (possibly in subsequent PRs, hopefully not necessarily by me): * Is there a way to use ccache on Windows? I'm not sure if there is, when using MSVS. * Find a way to install pre-built binaries on Linux rather than building ccache from scratch, which would save almost a whole minute per job. * Organize each platform to build all of its wheels (i.e. all of the python versions we are building for on that platform) in a single job rather than as completely independent jobs, allowing (a) the fixed per-job overhead such as container initialization, and installing python, and building of certain dependencies to happen ONCE per platform instead of separately for each wheel, and (b) the magic of ccache to speed up the builds *across* those wheels, since only a tiny amount of OIIO source codes depend on python at all. Signed-off-by: Larry Gritz <[email protected]>

lgritz marked this pull request as ready for review October 10, 2025 19:47

lgritz force-pushed the lg-wheelcache3 branch from b469aac to e3821ea Compare October 10, 2025 20:27

zachlewis approved these changes Oct 10, 2025

View reviewed changes

lgritz merged commit 6bc8627 into AcademySoftwareFoundation:main Oct 11, 2025
90 of 93 checks passed

lgritz deleted the lg-wheelcache3 branch October 11, 2025 01:52

lgritz added the build / testing / port / CI Affecting the build system, tests, platform support, porting, or continuous integration. label Oct 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: For python wheel generation, use ccache #4924

ci: For python wheel generation, use ccache #4924

Uh oh!

lgritz commented Oct 10, 2025

Uh oh!

zachlewis commented Oct 10, 2025

Uh oh!

lgritz commented Oct 10, 2025

Uh oh!

zachlewis commented Oct 10, 2025

Uh oh!

lgritz commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ci: For python wheel generation, use ccache #4924

ci: For python wheel generation, use ccache #4924

Uh oh!

Conversation

lgritz commented Oct 10, 2025

Uh oh!

zachlewis commented Oct 10, 2025

Uh oh!

lgritz commented Oct 10, 2025

Uh oh!

zachlewis commented Oct 10, 2025

Uh oh!

lgritz commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants