How to handle performance tests #7970

jgfouca · 2023-09-19T15:55:18Z

jgfouca
Sep 19, 2023
Maintainer

Based on the discussion during the performance team meeting, notes here: https://acme-climate.atlassian.net/wiki/spaces/EP/pages/3922133143/2023-09-18+Performance+Infrastructure+Meeting+Notes

I am still not 100% sure on what our approach should be so I thought it might be helpful to list the features/behaviors that we want for performance tests:

In order to save testing time, it would be very nice if we could use some of the existing tests already being run as both "normal" and "performance" tests. This is tricky because, in order to get useful performance data, you have to turn off I/O, which then renders the test useless from the point-of-view of doing baseline comparisons. It would be nice if there were a setting to exclude I/O from performance metrics. This would allow a test to serve as both a performance test and a regular baseline test. @jayeshkrishna , is this possible?
If a test is known to be a performance test, CIME and scripts in $component/cime_config/* should automatically configure things for performance testing without requiring the user to make a lot of additional changes through test-mods etc. @amametjanov , it would be helpful if you could describe the things you need to change in order to get useful performance data.
In order to implement (2), the component will need to know the current case is a is a performance test. This should be easy if we assume SAVE_TIMING=ON implies performance test. I think the latest CIME (our submodule doesn't have this yet) we can at least rely on SAVE_TIMING being set to ON for performance tests.
CIME will need to mark tests with significant performance decreases as FAILs. I think we already have this via the TPUTCOMP test phase, but this CIME ticket PFS test should save timing output to the baseline directory ESMCI/cime#2918 makes me think that maybe more functionality is needed. @billsacks , can you elaborate?
By default, our Jenkins python job code is set to ignore TPUTCOMP and MEMCOMP FAILs by default when reporting test statuses to cdash. The user can change this by adding --check-throughput --check-memory to the test script (in E3SM_test_scripts repo) but this is very easy to forget and it looks like almost no jobs are currently using these flags. Again, in the interest of making life easy for users, I think both these checks should be on by default for tests that have SAVE_TIMING on.
We want to be able to "bless" performance changes if we think a significant performance change in a test is acceptable. I think this is the one feature we fully understand and @wadeburgess and @jasonb5 already have most of what's needed here in place.

Feel free to edit to add features.

Related issues:

#5885

jgfouca · 2023-09-19T16:11:27Z

jgfouca
Sep 19, 2023
Maintainer Author

Depending on the answer to 1. and 2. it may not be possible for a test to be both a baseline and performance test. If that's the case, then it raises the question of whether the new test-suite flag perf : True is adding any value and we should instead just use the PFS test type for all perf tests.

0 replies

billsacks · 2023-09-19T16:34:40Z

billsacks
Sep 19, 2023
Collaborator

CIME will need to mark tests with significant performance decreases as FAILs. I think we already have this via the TPUTCOMP test phase, but this CIME ticket PFS test should save timing output to the baseline directory ESMCI/cime#2918 makes me think that maybe more functionality is needed. @billsacks , can you elaborate?

At least in CESM, and at least last time I checked, the current TPUTCOMP is just based on the overall run times given in the model log files. This often isn't very useful for understanding differences in timings between runs. What I was referring to in ESMCI/cime#2918 was saving the full contents of the timing directory (i.e., the detailed timing results that give min/max/etc. times spent in different blocks of code surrounded by timers). Is this possibly related to what you already do with SAVE_TIMING?

0 replies

jgfouca · 2023-09-19T16:37:00Z

jgfouca
Sep 19, 2023
Maintainer Author

Thanks @billsacks , that does sound more useful than the TPUTCOMP check and possibly addresses point (1) above since it presumably would allow us to compare times that don't include I/O.

0 replies

rljacob · 2023-09-19T20:25:21Z

rljacob
Sep 19, 2023
Maintainer

Yes if we examine the timing output more closely, we can look at timers that don't include I/O.

Note that turning off I/O would also break any compareTwo test.

Currently, the fail condition is "time taken is outside tolerance from the last saved run". But if repeated PRs are just under the threshold, the model will gradually slow down and never fail a test. The threshold can't be to small because machine variability can lead to false negatives. So we need something outside of CIME that tracks performance trends over a few days or a a week.

Does the new capability in 6 require separate blesses for both performance fails AND baselines fails? When v3 went in, it slowed down the model and changed answers from baselines. The bless for the baselines also blessed the throughput change. In that case, we knew about the slowdown and were going to bless it but other baseline-change blesses could sneak in a performance degradation unless the performance slowdown is caught and requires its own bless.

0 replies

jgfouca · 2023-09-19T20:30:31Z

jgfouca
Sep 19, 2023
Maintainer Author

@rljacob , that's a good point regarding (6). The clear intent from my meetings with @sarats is that we want to keep hist and performance baselines conceptually separate.

0 replies

jgfouca · 2023-09-20T15:21:40Z

jgfouca
Sep 20, 2023
Maintainer Author

@jasonb5 , see above. This may require a bit more work on bless_test_results than we initially thought. We may also want to take a look at system_tests_common.py and the timing output to see if there are more useful comparisons that can be implemented.

0 replies

rljacob · 2023-09-21T20:21:48Z

rljacob
Sep 21, 2023
Maintainer

I still think we need a nightly test to see if there's a slowdown above the threshold and then separate bless for that. But we'll need another monitoring system watching for a gradual slowdown over days, weeks. Maybe that was part of the original PACE place. @sarats ?

0 replies

sarats · 2023-09-28T07:51:44Z

sarats
Sep 28, 2023
Collaborator

@rljacob Yes, long-term longitudinal tracking was previously done using the E3SM Standardized performance benchmarks and now with proper PFS_* tests along with PACE would facilitate a better process.

Distracted by a conference and trying to catch up on what the remaining questions are to get the initial system in place.

0 replies

jasonb5 · 2023-10-02T22:48:17Z

jasonb5
Oct 2, 2023
Collaborator

I have the CIME baseline update started (ESMCI/cime#4491). Right now it is still parsing and storing throughput and memory usage from the coupler log. With --save-timing we have access to the SYPD for each component.

The code for generating the performance baselines lives in CIME's repo right now, I can also update the design so that the code can live with the coupler. This way each coupler/model can define their own methods of generating/comparing performance metrics.

0 replies

jgfouca · 2023-10-03T15:57:41Z

jgfouca
Oct 3, 2023
Maintainer Author

@jasonb5 , nice!

0 replies

sarats · 2023-10-04T16:29:46Z

sarats
Oct 4, 2023
Collaborator

The component high-level timers that are used to compute throughput would include certain amount of I/O as well if I/O isn't turned off.

Regarding fine-grain timers: We parse and capture these within PACE. However, there isn't sufficient consistency or patterns across components that would make capturing relevant "performance" from a baseline run straight forward.

As it stands, we have to just use PFS and baseline tests separately to start with until we have a clear definition of every component's run loop timer without I/O.

0 replies

rljacob · 2023-10-04T17:05:55Z

rljacob
Oct 4, 2023
Maintainer

There must be something in model_timing_stats we could look at.

0 replies

sarats · 2023-10-11T18:45:14Z

sarats
Oct 11, 2023
Collaborator

@jasonb5 Curious about the status of your CIME updates.

0 replies

jasonb5 · 2023-10-12T18:46:07Z

jasonb5
Oct 12, 2023
Collaborator

@sarats Just finishing up testing on the CIME side, I want to ensure we have good unit test coverage before it's merged and we do a CIME update. Should be merged by the end of this week.

0 replies

sarats · 2023-10-16T17:02:42Z

sarats
Oct 16, 2023
Collaborator

Thanks @jasonb5 for getting ESMCI/cime#4491 done.

Looks like we should be set to start performance tests now.

0 replies

jasonb5 · 2023-10-23T23:47:42Z

jasonb5
Oct 23, 2023
Collaborator

@sarats That PR doesn't impact creation or comparison of perf baselines.

0 replies

amametjanov · 2023-10-26T20:16:08Z

amametjanov
Oct 26, 2023
Collaborator

@jasonb5 If there are no cpl-[tput,mem].log files, new ones should be generated by bless_test_results.

> ./cime/CIME/Tools/bless_test_results -r /pscratch/sd/e/e3smtest/e3sm_scratch/pm-cpu/J -t JNextBench_lores20231025
Matched test batch is JNextBench_lores20231025_095608
###############################################################################
Blessing results for test: PFS_P5400.ne30pg2_EC30to60E2r2.F2010.pm-cpu_intel.eam-bench-noio, most recent result: PASS
Case dir: /pscratch/sd/e/e3smtest/e3sm_scratch/pm-cpu/J/PFS_P5400.ne30pg2_EC30to60E2r2.F2010.pm-cpu_intel.eam-bench-noio.C.JNextBench_lores20231025_095608
###############################################################################
Test 'PFS_P5400.ne30pg2_EC30to60E2r2.F2010.pm-cpu_intel.eam-bench-noio' had namelist diff
Update namelists (y/n)? y
Could not read throughput file: [Errno 2] No such file or directory: '/global/cfs/cdirs/e3sm/baselines/intel/master/PFS_P5400.ne30pg2_EC30to60E2r2.F2010.pm-cpu_intel.eam-bench-noio/cpl-tput.log'
Could not read memory usage file: [Errno 2] No such file or directory: '/global/cfs/cdirs/e3sm/baselines/intel/master/PFS_P5400.ne30pg2_EC30to60E2r2.F2010.pm-cpu_intel.eam-bench-noio/cpl-mem.log'

Also, please add the commit-hash to tput and mem logs.

0 replies

sarats · 2023-10-27T18:38:57Z

sarats
Oct 27, 2023
Collaborator

@amametjanov and @jasonb5 Can you guys get together to iron out any other missing bits and minimize the iterations with the CIME PR, merge, E3SM upstream pull etc. Perhaps we can test some of this stuff locally/manually before issuing the final PRs.

0 replies

jasonb5 · 2023-10-30T23:18:21Z

jasonb5
Oct 30, 2023
Collaborator

@sarats @amametjanov I've fixed generating the performance baselines by bless_test_results when they don't exist, this is already in the CIME master branch. @amametjanov You can verify this using the master branch of CIME, I'll add the commit-hash to the tput and mem logs.

0 replies

jasonb5 · 2023-10-31T00:15:39Z

jasonb5
Oct 31, 2023
Collaborator

@amametjanov I've added the commit hash and date to the baseline files, you can use the branch from ESMCI/cime#4508 to test.

0 replies

sarats · 2023-11-02T22:46:42Z

sarats
Nov 2, 2023
Collaborator

@jgfouca I guess we need to do a final CIME update including ESMCI/cime#4508 to close this, correct?
We will keep the combined baseline+perf test idea on the radar.

0 replies

amametjanov · 2023-11-02T23:47:38Z

amametjanov
Nov 2, 2023
Collaborator

We need a couple more updates:

add --ignore-history-diffs: so that perf-tests avoid failing on history diffs
archive timing to SAVE_TIMING_DIR: --save-timing isn't saving timing info.

0 replies

sarats · 2023-11-07T14:14:40Z

sarats
Nov 7, 2023
Collaborator

@jasonb5 Will you be able to look into the above two tasks?

0 replies

jgfouca · 2023-11-07T16:16:07Z

jgfouca
Nov 7, 2023
Maintainer Author

@amametjanov , @sarats , (1) is easy. I'm confused about (2). Is --save-timing just totally broken?

0 replies

amametjanov · 2023-11-07T21:10:38Z

amametjanov
Nov 7, 2023
Collaborator

@jgfouca timing is getting archived into $RUNDIR/timing.$LID.tar.gz, but not getting copied to SAVE_TIMING_DIR.

0 replies

jgfouca · 2023-11-08T16:55:00Z

jgfouca
Nov 8, 2023
Maintainer Author

@amametjanov , I did some debugging and it looks to me like it is working as intended:

JGF copying performance archive files to /sems-data-store/ACME/mappy/timings/performance_archive/jgfouca/ERS.f19_g16_rx1.A.mappy_gnu.20231108_093610_4nqyvk/1317334.231108-093723

+++ b/cime_config/customize/provenance.py
@@ -744,6 +744,7 @@ def _kill_mach_syslog(job_id, rundir):
 def _copy_performance_archive_files(
     case, lid, job_id, mach, rundir, caseroot, full_timing_dir, timing_saved_file
 ):
+    logger.info(f"JGF copying performance archive files to {full_timing_dir}")

I'm wondering if maybe you are getting tripped-up by the timing_dir gatekeeping code that checks the project:

            project = case.get_value("PROJECT", subgroup=case.get_primary_job())
            if not case.is_save_timing_dir_project(project):
	        return

... so, only if PROJECT matches SAVE_TIMING_DIR_PROJECTS, which is a comma-separated list of regular expressions, will it copy the stuff to SAVE_TIMING_DIR.

0 replies

jasonb5 · 2023-11-08T16:58:47Z

jasonb5
Nov 8, 2023
Collaborator

@sarats @amametjanov @jgfouca Would it be better to bypass both history and namelists when blessing performance, keeping the two blessing paths completely separate?

0 replies

sarats · 2023-11-08T23:46:39Z

sarats
Nov 8, 2023
Collaborator

I'm fine with a separate and distinct pathway for performance blesses bypassing history and namelists.

0 replies

jasonb5 · 2023-11-14T23:51:16Z

jasonb5
Nov 14, 2023
Collaborator

@amametjanov CIME's master branch now has the fix to separate hist/nml and performance blessing.

0 replies

amametjanov · 2023-11-18T04:05:34Z

amametjanov
Nov 18, 2023
Collaborator

(1) is easy. I'm confused about (2). Is --save-timing just totally broken?

(1): pending
(2): this appears to be a PACE issue: provenance data isn't getting uploaded from PM-CPU
(3): new issue: shared-exe share:True test-suites are running into case-setup error Could not create the cosp build directory: e.g. coupled tests in https://my.cdash.org/viewTest.php?&buildid=2440663

0 replies

How to handle performance tests #7970

Uh oh!

Uh oh!

jgfouca Sep 19, 2023 Maintainer

Replies: 34 comments

Uh oh!

jgfouca Sep 19, 2023 Maintainer Author

Uh oh!

billsacks Sep 19, 2023 Collaborator

Uh oh!

Uh oh!

jgfouca Sep 19, 2023 Maintainer Author

Uh oh!

rljacob Sep 19, 2023 Maintainer

Uh oh!

jgfouca Sep 19, 2023 Maintainer Author

Uh oh!

jgfouca Sep 20, 2023 Maintainer Author

Uh oh!

rljacob Sep 21, 2023 Maintainer

Uh oh!

sarats Sep 28, 2023 Collaborator

Uh oh!

jasonb5 Oct 2, 2023 Collaborator

Uh oh!

jgfouca Oct 3, 2023 Maintainer Author

Uh oh!

sarats Oct 4, 2023 Collaborator

Uh oh!

rljacob Oct 4, 2023 Maintainer

Uh oh!

sarats Oct 11, 2023 Collaborator

Uh oh!

Uh oh!

jasonb5 Oct 12, 2023 Collaborator

Uh oh!

sarats Oct 16, 2023 Collaborator

Uh oh!

jasonb5 Oct 23, 2023 Collaborator

Uh oh!

amametjanov Oct 26, 2023 Collaborator

Uh oh!

sarats Oct 27, 2023 Collaborator

Uh oh!

jasonb5 Oct 30, 2023 Collaborator

Uh oh!

jasonb5 Oct 31, 2023 Collaborator

Uh oh!

sarats Nov 2, 2023 Collaborator

Uh oh!

amametjanov Nov 2, 2023 Collaborator

Uh oh!

sarats Nov 7, 2023 Collaborator

Uh oh!

jgfouca Nov 7, 2023 Maintainer Author

Uh oh!

amametjanov Nov 7, 2023 Collaborator

Uh oh!

jgfouca Nov 8, 2023 Maintainer Author

Uh oh!

jasonb5 Nov 8, 2023 Collaborator

Uh oh!

sarats Nov 8, 2023 Collaborator

Uh oh!

jasonb5 Nov 14, 2023 Collaborator

Uh oh!

amametjanov Nov 18, 2023 Collaborator

jgfouca
Sep 19, 2023
Maintainer

jgfouca
Sep 19, 2023
Maintainer Author

billsacks
Sep 19, 2023
Collaborator

jgfouca
Sep 19, 2023
Maintainer Author

rljacob
Sep 19, 2023
Maintainer

jgfouca
Sep 19, 2023
Maintainer Author

jgfouca
Sep 20, 2023
Maintainer Author

rljacob
Sep 21, 2023
Maintainer

sarats
Sep 28, 2023
Collaborator

jasonb5
Oct 2, 2023
Collaborator

jgfouca
Oct 3, 2023
Maintainer Author

sarats
Oct 4, 2023
Collaborator

rljacob
Oct 4, 2023
Maintainer

sarats
Oct 11, 2023
Collaborator

jasonb5
Oct 12, 2023
Collaborator

sarats
Oct 16, 2023
Collaborator

jasonb5
Oct 23, 2023
Collaborator

amametjanov
Oct 26, 2023
Collaborator

sarats
Oct 27, 2023
Collaborator

jasonb5
Oct 30, 2023
Collaborator

jasonb5
Oct 31, 2023
Collaborator

sarats
Nov 2, 2023
Collaborator

amametjanov
Nov 2, 2023
Collaborator

sarats
Nov 7, 2023
Collaborator

jgfouca
Nov 7, 2023
Maintainer Author

amametjanov
Nov 7, 2023
Collaborator

jgfouca
Nov 8, 2023
Maintainer Author

jasonb5
Nov 8, 2023
Collaborator

sarats
Nov 8, 2023
Collaborator

jasonb5
Nov 14, 2023
Collaborator

amametjanov
Nov 18, 2023
Collaborator