-
Notifications
You must be signed in to change notification settings - Fork 395
ci: improve CI caching scheme #1994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I recently discovered that the caching of our ccache was either not
working at all, or was much less efficient/heplful than I thought. So
there are a number of improvements here:
- Split caching into separate restore/save, and rearrange the step
order so we save cache before tests run. This allows us to save the
ccache even for jobs with failed tests (where the job fails), or for
jobs that we purposely end early (maybe because we see failing
tests), not only when all tests finish and succeed. This makes
subsequent builds, that we do on the same branch while attempting to
fix failures, go faster by using the cache.
- Make subsequent pushes to the same PR or REF to cancel any previous
jobs, so if we push updates before the last CI run of the same
branch completes, it won't keep those going as useless zombie jobs.
- Change the cache key names we use to remove the github ref to
encourage cache reuse across different commits or branches (I
think), and to fix a couple errors where we had mis-named things and
were sharing caches between jobs when we shouldn't or vice versa.
- Be more careful about the location of the cache, making sure we were
for sure telling the GHA cache action to be saving/restoring the
same directory location that we told ccache to use as its cache.
- Use CCACHE_COMPRESSION env var to be extra sure we're using
compressed caches to be sure we get the best bang for buck on the
limited cache storage we're allowed.
In all, there were a number of subtle things broken. Give it a good
scrubbing. I'm not really sure exactly which one of these, or which
combination, was most responsible for getting things unstuck, but it
definitely works! Some results from two test pushes in a row I did
after these changes:
build deps build osl
VFXP 2024 / cold 4:12 6:11
VFXP 2024 / cached 1:42 (2.5x) 0:48 (7x)
ABI check / cold 4:11 6:14 + 5:38(abi ref)
ABI check / cached 0:57 (4.4x) 0:46 + 0:16 (11.5x)
bleeding edge / cold 8:12 9:57
bleeding edge / cached 1:39 (5x) 0:44 (13.6x)
It varies across jobs and from run to run, but let's call it something
in the neighbourhood of 4x speedup building dependencies and 10x
building OSL itself. It's not better because it doesn't speed up
linking, and not all of the dependencies' build systems are doing
things in a way that is amenable to ccache inserting itself into the
process. Still, this should be a big help in reducing the time of
our CI runs.
Signed-off-by: Larry Gritz <[email protected]>
Signed-off-by: Larry Gritz <[email protected]>
use ccache. Signed-off-by: Larry Gritz <[email protected]>
Collaborator
Author
|
Seem ok? I would love to get this merged because once it's in, all other CI runs should be significantly faster. |
aconty
approved these changes
Jun 10, 2025
lgritz
added a commit
to lgritz/OpenShadingLanguage
that referenced
this pull request
Jul 2, 2025
I recently discovered that the caching of our ccache was either not
working at all, or was much less efficient/helpful than I thought. So
there are a number of improvements here:
- Split caching into separate restore/save, and rearrange the step
order so we save cache before tests run. This allows us to save the
ccache even for jobs with failed tests (where the job fails), or for
jobs that we purposely end early (maybe because we see failing
tests), not only when all tests finish and succeed. This makes
subsequent builds, that we do on the same branch while attempting to
fix failures, go faster by using the cache.
- Make subsequent pushes to the same PR or REF to cancel any previous
jobs, so if we push updates before the last CI run of the same
branch completes, it won't keep those going as useless zombie jobs.
- Change the cache key names we use to remove the github ref to
encourage cache reuse across different commits or branches (I
think), and to fix a couple errors where we had mis-named things and
were sharing caches between jobs when we shouldn't or vice versa.
- Be more careful about the location of the cache, making sure we were
for sure telling the GHA cache action to be saving/restoring the
same directory location that we told ccache to use as its cache.
- Use CCACHE_COMPRESSION env var to be extra sure we're using
compressed caches to be sure we get the best bang for buck on the
limited cache storage we're allowed.
In all, there were a number of subtle things broken. Give it a good
scrubbing. I'm not really sure exactly which one of these, or which
combination, was most responsible for getting things unstuck, but it
definitely works! Some results from two test pushes in a row I did
after these changes:
build deps build osl
VFXP 2024 / cold 4:12 6:11
VFXP 2024 / cached 1:42 (2.5x) 0:48 (7x)
ABI check / cold 4:11 6:14 + 5:38(abi ref)
ABI check / cached 0:57 (4.4x) 0:46 + 0:16 (11.5x)
bleeding edge / cold 8:12 9:57
bleeding edge / cached 1:39 (5x) 0:44 (13.6x)
It varies across jobs and from run to run, but let's call it something
in the neighbourhood of 4x speedup building dependencies and 10x
building OSL itself. It's not better because it doesn't speed up
linking, and not all of the dependencies' build systems are doing
things in a way that is amenable to ccache inserting itself into the
process. Still, this should be a big help in reducing the time of
our CI runs.
---------
Signed-off-by: Larry Gritz <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
build / testing / port / CI
Affecting the build system, tests, platform support, porting, or continuous integration.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I recently discovered that the caching of our ccache was either not working at all, or was much less efficient/heplful than I thought. So there are a number of improvements here:
Split caching into separate restore/save, and rearrange the step order so we save cache before tests run. This allows us to save the ccache even for jobs with failed tests (where the all-in-one unsplit approach would skip the cache save step for a failed job), or for jobs that we purposely end early (maybe because we see failing tests), not only when all tests finish and succeed. This makes subsequent builds, that we do on the same branch while attempting to fix failures, go faster by using the cache.
Make subsequent pushes to the same PR or REF to cancel any previous jobs, so if we push updates before the last CI run of the same branch completes, it won't keep those going as useless zombie jobs.
Change the cache key names we use to remove the github ref to encourage cache reuse across different commits or branches (I think), and to fix a couple errors where we had mis-named things and were sharing caches between jobs when we shouldn't or vice versa.
Be more careful about the location of the cache, making sure we were for sure telling the GHA cache action to be saving/restoring the same directory location that we told ccache to use as its cache location.
Use CCACHE_COMPRESSION env var to be extra sure we're using compressed caches to be sure we get the best bang for buck on the limited cache storage we're allowed.
In all, there were a number of subtle things broken. Give it a good scrubbing. I'm not really sure exactly which one of these, or which combination, was most responsible for getting things unstuck, but it definitely works! Some results from two test pushes in a row I did after these changes:
It varies across jobs and from run to run, but let's call it something in the neighbourhood of 4x speedup building dependencies and 10x building OSL itself. It's not better because it doesn't speed up linking, and not all of the dependencies' build systems are doing things in a way that is amenable to ccache inserting itself into the process. Still, this should be a big help in reducing the time of our CI runs.