Skip to content

Compare tests#1289

Merged
dguittet merged 31 commits intodevelopfrom
compare_tests
Apr 14, 2025
Merged

Compare tests#1289
dguittet merged 31 commits intodevelopfrom
compare_tests

Conversation

@dguittet
Copy link
Copy Markdown
Collaborator

@dguittet dguittet commented Mar 3, 2025

Follow up to #1286

test_times.yml

test_times.yml is a new workflow that is triggered by the completion of ci.yml: GitHub docs here

Once ci.yml completes on any branch (call this the feature branch even if it's patch/dev), it uploads a CSV of elapsed test times.

test_times.yml will be triggered to run on the default branch (patch or dev). It will download the CSV of the triggering workflow, which measures the elapsed test times of the feature branch. It compares that downloaded CSV to the most recent CSV produced by a workflow on the default branch.

There are two parameters in test_times.yml:
HEAD_DIFF_TOL: For comparison against the default branch head, relative value in time elapsed above which the check will fail. Currently set at 0.05
REL_DIFF_TOL: For comparison against the release, CSVs stored in test/elapsed_time_release. Set to 0.10
DIFF_THRESHOLD_MS: in milliseconds, the length of time under which comparisons don't need to be run. Currently set at 3 (don't compare tests that take <3 ms)

Note:: this test_times.yml needs to be on the default branch to run as a triggered workflow. So the github actions workflows for this PR won't run test_times.yml

Examples on my fork

Other references:

release.yml

This is triggered by a new release and will update the CSVs in test/elapsed_time_release by downloading the workflow artifacts produced by the workflow associated with the SHA tagged in the release

@dguittet dguittet requested review from brtietz and sjanzou March 3, 2025 21:32
@brtietz
Copy link
Copy Markdown
Collaborator

brtietz commented Mar 3, 2025

The code and artifacts look great! I have one concern with the threshold. It looks like this would catch a 10% increase in a single PR getting merged into patch or develop. Do we need to be concerned about a slow increase (e.g. 3-5% in each PR adding up to more than 10% over a larger number of PRs)? If so, how difficult is it to tag the reference times to the most recent released version?

Copy link
Copy Markdown
Collaborator

@sjanzou sjanzou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

In building on what @brtietz said about tying the comparison to a release to see how the times change over time, how about doing a comparison of total test time to capture any below DIFF_THRESHOLD_MS that may add up...

There are 369 tests (of the 896 listed) in the artifact with elapsed time below 3ms (DIFF_THRESHOLD_MS) that would not be considered in the comparison.

The artifact is great:
image

Also, the total time is in the artifact
image
which seems like a good thing to track and compare as we add/remove/modify tests.

@dguittet
Copy link
Copy Markdown
Collaborator Author

dguittet commented Mar 4, 2025

The code and artifacts look great! I have one concern with the threshold. It looks like this would catch a 10% increase in a single PR getting merged into patch or develop. Do we need to be concerned about a slow increase (e.g. 3-5% in each PR adding up to more than 10% over a larger number of PRs)? If so, how difficult is it to tag the reference times to the most recent released version?

We could compare it to references times in the latest release, but we'd have to save a copy of a releases' elapsed test times since Github Actions Artifacts expire after 90 days. And so with each release, that CSV would have to be updated, which I can do via a Github action that is triggered by a release. Would this be in addition to the stepwise comparison, or a replacement?

@dguittet
Copy link
Copy Markdown
Collaborator Author

dguittet commented Mar 4, 2025

In building on what @brtietz said about tying the comparison to a release to see how the times change over time, how about doing a comparison of total test time to capture any below DIFF_THRESHOLD_MS that may add up...

The Total time row is included in the comparisons so if the Total time is increased by more than 10%, it'll be printed in the csv uploaded by test_times.yml (was named compare_times.csv, but I'm going to rename it as failing_test_times.csv to be more explicit).

@dguittet
Copy link
Copy Markdown
Collaborator Author

@brtietz @sjanzou

Added comparison to release times
Added release.yml to update the CSVs with a new release

Copy link
Copy Markdown
Collaborator

@brtietz brtietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Love the automated PR for releases.

@brtietz brtietz changed the base branch from patch to develop March 14, 2025 17:28
Copy link
Copy Markdown
Collaborator

@sjanzou sjanzou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely useful for the "Operation Speedy Gonzales"!

@dguittet
Copy link
Copy Markdown
Collaborator Author

Actually, there's still an issue here where when there's a new Release, the Release-triggered action that sets up the PR is run at the same time as the action that creates the gtest time elapsed CSV. Since the update-via-PR action depends on the CSV, it'll fail. I have to think of how to get the dependency right

@sjanzou
Copy link
Copy Markdown
Collaborator

sjanzou commented Mar 28, 2025

Actually, there's still an issue here where when there's a new Release, the Release-triggered action that sets up the PR is run at the same time as the action that creates the gtest time elapsed CSV. Since the update-via-PR action depends on the CSV, it'll fail. I have to think of how to get the dependency right

Ahh... good point - not that we have more than one ci running (ci.yml and coverage.yml) and multiple runners in ci.yml, i think we all need to become more aware of the interdependencies.

We should be testing the new Release paradigm within the next couple of weeks!

Thanks for the critical review of your own pull request!

@dguittet
Copy link
Copy Markdown
Collaborator Author

dguittet commented Apr 14, 2025

Added a retry loop to the release.yml which downloads the updated test CSVs produced by the CI of the release. This worked on my fork: https://github.com/dguittet/ssc/pulls

Every 20 minutes, check if the gtest elapsed times csv is available. Timeout after 3 hours

@dguittet dguittet merged commit 8fafd75 into develop Apr 14, 2025
9 checks passed
@dguittet dguittet deleted the compare_tests branch April 14, 2025 19:25
@coveralls
Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 14451664057

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 2430 unchanged lines in 31 files lost coverage.
  • Overall coverage increased (+4.3%) to 54.706%

Files with Coverage Reduction New Missed Lines %
ssc/shared/lib_battery_dispatch.h 1 73.68%
ssc/shared/lib_util.h 2 69.92%
ssc/ssc/cmod_mhk_tidal.cpp 2 90.6%
ssc/tcs/csp_solver_pc_Rankine_indirect_224.cpp 2 60.36%
ssc/tcs/csp_solver_util.cpp 2 84.04%
ssc/tcs/numeric_solvers.h 3 85.71%
ssc/ssc/common_financial.cpp 7 72.64%
ssc/shared/lib_battery_dispatch_manual.cpp 8 87.14%
ssc/ssc/cmod_wavefile.cpp 8 73.26%
ssc/ssc/tckernel.cpp 11 54.55%
Totals Coverage Status
Change from base Build 13861376566: 4.3%
Covered Lines: 64318
Relevant Lines: 117571

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants