Skip to content

Conversation

@lgritz
Copy link
Collaborator

@lgritz lgritz commented Jun 6, 2025

I recently discovered that the caching of our ccache was either not working at all, or was much less efficient/heplful than I thought. So there are a number of improvements here:

  • Split caching into separate restore/save, and rearrange the step order so we save cache before tests run. This allows us to save the ccache even for jobs with failed tests (where the all-in-one unsplit approach would skip the cache save step for a failed job), or for jobs that we purposely end early (maybe because we see failing tests), not only when all tests finish and succeed. This makes subsequent builds, that we do on the same branch while attempting to fix failures, go faster by using the cache.

  • Make subsequent pushes to the same PR or REF to cancel any previous jobs, so if we push updates before the last CI run of the same branch completes, it won't keep those going as useless zombie jobs.

  • Change the cache key names we use to remove the github ref to encourage cache reuse across different commits or branches (I think), and to fix a couple errors where we had mis-named things and were sharing caches between jobs when we shouldn't or vice versa.

  • Be more careful about the location of the cache, making sure we were for sure telling the GHA cache action to be saving/restoring the same directory location that we told ccache to use as its cache location.

  • Use CCACHE_COMPRESSION env var to be extra sure we're using compressed caches to be sure we get the best bang for buck on the limited cache storage we're allowed.

In all, there were a number of subtle things broken. Give it a good scrubbing. I'm not really sure exactly which one of these, or which combination, was most responsible for getting things unstuck, but it definitely works! Some results from two test pushes in a row I did after these changes:

test                        build deps     build osl
-----------------------     ------------   ----------------------
VFXP 2024 / cold            4:12            6:11
VFXP 2024 / cached          1:42 (2.5x)     0:48 (7x)

ABI check / cold            4:11            6:14 + 5:38(abi ref)
ABI check / cached          0:57 (4.4x)     0:46 + 0:16 (11.5x)

bleeding edge / cold        8:12            9:57
bleeding edge / cached      1:39 (5x)       0:44 (13.6x)

It varies across jobs and from run to run, but let's call it something in the neighbourhood of 4x speedup building dependencies and 10x building OSL itself. It's not better because it doesn't speed up linking, and not all of the dependencies' build systems are doing things in a way that is amenable to ccache inserting itself into the process. Still, this should be a big help in reducing the time of our CI runs.

lgritz added 2 commits June 8, 2025 17:22
I recently discovered that the caching of our ccache was either not
working at all, or was much less efficient/heplful than I thought. So
there are a number of improvements here:

- Split caching into separate restore/save, and rearrange the step
  order so we save cache before tests run. This allows us to save the
  ccache even for jobs with failed tests (where the job fails), or for
  jobs that we purposely end early (maybe because we see failing
  tests), not only when all tests finish and succeed. This makes
  subsequent builds, that we do on the same branch while attempting to
  fix failures, go faster by using the cache.

- Make subsequent pushes to the same PR or REF to cancel any previous
  jobs, so if we push updates before the last CI run of the same
  branch completes, it won't keep those going as useless zombie jobs.

- Change the cache key names we use to remove the github ref to
  encourage cache reuse across different commits or branches (I
  think), and to fix a couple errors where we had mis-named things and
  were sharing caches between jobs when we shouldn't or vice versa.

- Be more careful about the location of the cache, making sure we were
  for sure telling the GHA cache action to be saving/restoring the
  same directory location that we told ccache to use as its cache.

- Use CCACHE_COMPRESSION env var to be extra sure we're using
  compressed caches to be sure we get the best bang for buck on the
  limited cache storage we're allowed.

In all, there were a number of subtle things broken. Give it a good
scrubbing.  I'm not really sure exactly which one of these, or which
combination, was most responsible for getting things unstuck, but it
definitely works!  Some results from two test pushes in a row I did
after these changes:

                              build deps     build osl
    VFXP 2024 / cold            4:12            6:11
    VFXP 2024 / cached          1:42 (2.5x)     0:48 (7x)

    ABI check / cold            4:11            6:14 + 5:38(abi ref)
    ABI check / cached          0:57 (4.4x)     0:46 + 0:16 (11.5x)

    bleeding edge / cold        8:12            9:57
    bleeding edge / cached      1:39 (5x)       0:44 (13.6x)

It varies across jobs and from run to run, but let's call it something
in the neighbourhood of 4x speedup building dependencies and 10x
building OSL itself. It's not better because it doesn't speed up
linking, and not all of the dependencies' build systems are doing
things in a way that is amenable to ccache inserting itself into the
process. Still, this should be a big help in reducing the time of
our CI runs.

Signed-off-by: Larry Gritz <[email protected]>
@lgritz
Copy link
Collaborator Author

lgritz commented Jun 9, 2025

Seem ok? I would love to get this merged because once it's in, all other CI runs should be significantly faster.

@lgritz lgritz merged commit c71d804 into AcademySoftwareFoundation:main Jun 10, 2025
25 checks passed
@lgritz lgritz deleted the lg-ci-caching branch June 10, 2025 18:40
lgritz added a commit to lgritz/OpenShadingLanguage that referenced this pull request Jul 2, 2025
I recently discovered that the caching of our ccache was either not
working at all, or was much less efficient/helpful than I thought. So
there are a number of improvements here:

- Split caching into separate restore/save, and rearrange the step
  order so we save cache before tests run. This allows us to save the
  ccache even for jobs with failed tests (where the job fails), or for
  jobs that we purposely end early (maybe because we see failing
  tests), not only when all tests finish and succeed. This makes
  subsequent builds, that we do on the same branch while attempting to
  fix failures, go faster by using the cache.

- Make subsequent pushes to the same PR or REF to cancel any previous
  jobs, so if we push updates before the last CI run of the same
  branch completes, it won't keep those going as useless zombie jobs.

- Change the cache key names we use to remove the github ref to
  encourage cache reuse across different commits or branches (I
  think), and to fix a couple errors where we had mis-named things and
  were sharing caches between jobs when we shouldn't or vice versa.

- Be more careful about the location of the cache, making sure we were
  for sure telling the GHA cache action to be saving/restoring the
  same directory location that we told ccache to use as its cache.

- Use CCACHE_COMPRESSION env var to be extra sure we're using
  compressed caches to be sure we get the best bang for buck on the
  limited cache storage we're allowed.

In all, there were a number of subtle things broken. Give it a good
scrubbing.  I'm not really sure exactly which one of these, or which
combination, was most responsible for getting things unstuck, but it
definitely works!  Some results from two test pushes in a row I did
after these changes:

                              build deps     build osl
    VFXP 2024 / cold            4:12            6:11
    VFXP 2024 / cached          1:42 (2.5x)     0:48 (7x)

    ABI check / cold            4:11            6:14 + 5:38(abi ref)
    ABI check / cached          0:57 (4.4x)     0:46 + 0:16 (11.5x)

    bleeding edge / cold        8:12            9:57
    bleeding edge / cached      1:39 (5x)       0:44 (13.6x)

It varies across jobs and from run to run, but let's call it something
in the neighbourhood of 4x speedup building dependencies and 10x
building OSL itself. It's not better because it doesn't speed up
linking, and not all of the dependencies' build systems are doing
things in a way that is amenable to ccache inserting itself into the
process. Still, this should be a big help in reducing the time of
our CI runs.

---------

Signed-off-by: Larry Gritz <[email protected]>
@lgritz lgritz added the build / testing / port / CI Affecting the build system, tests, platform support, porting, or continuous integration. label Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build / testing / port / CI Affecting the build system, tests, platform support, porting, or continuous integration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants