Tolerances for CI Workflow - Grind & Exec Times (#750) #876

Malmahrouqi3 · 2025-06-11T00:05:13Z

Description

Enhancement for CI workflow. Raises MFCException if grind time is less than an acceptable threshold. For Exec, it prints an error.

Resolves/closes #750

codecov · 2025-06-11T02:24:00Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.64%. Comparing base (db44da1) to head (4e1eacc).
Report is 6 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #876      +/-   ##
==========================================
+ Coverage   42.95%   45.64%   +2.69%     
==========================================
  Files          69       68       -1     
  Lines       19504    18646     -858     
  Branches     2366     2249     -117     
==========================================
+ Hits         8377     8511     +134     
+ Misses       9704     8775     -929     
+ Partials     1423     1360      -63

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wilfonba · 2025-06-11T16:01:47Z

I picked the 0.98 tolerance for this. I looked through some old CI runs, and the grind times reported were basically all 0.99 or 1.00, so something worse than 0.98 seemed like it was worth looking at. @sbryngelson can comment on whether he thinks this is a reasonable tolerance or not.

sbryngelson · 2025-06-11T16:08:16Z

@mohdsaid497566 @wilfonba
Right now, the thing that I hate the most is that the benchmark CI fails about 30% of the time due to some slurm/filesystem error. I have to re-run benchmarking a few times in some cases to get it to finish without error. This used to have an 80% failure rate, but I made it somewhat more robust. Nothing you guys can fix, but one day I'll get with PACE to fix it up. Until then, having this isn't super useful since benchmarking fails spuriously already (so those would become false positives basically).

I really like the idea of this, but I would much rather have continuous benchmarking, which is more robust/repeatable, and insightful (fixing this PR is just a bit of a convenience for me).

Malmahrouqi3 · 2025-06-11T23:49:00Z

Sample benchmark output. It needs a minor fix.

 Comparing Benchmarks: Speedups from ../master/bench-cpu.yaml to bench-cpu.yaml are displayed below. Thus, numbers > 1 represent increases in performance.
 Warning: Exec time speedup for pre_process is less than 0.9 - Case: ibm
                                                                                
  Case                      Pre Process          Simulation       Post Process  
 ────────────────────────────────────────────────────────────────────────────── 
  5eq_rk3_weno3_hllc   Exec: Exec: 1.08    Exec: Exec: 1.00   Exec: Exec: 1.15  
                                             & Grind: N/A &                     
                                                Grind: 1.00                     
  ibm                  Exec: Exec: 0.84    Exec: Exec: 1.08   Exec: Exec: 1.08  
                                             & Grind: N/A &                     
                                                Grind: 1.09                     
  viscous_weno5_sgb…   Exec: Exec: 1.04    Exec: Exec: 1.00   Exec: Exec: 1.01  
                                             & Grind: N/A &                     
                                                Grind: 1.01                     
  hypo_hll             Exec: Exec: 1.02    Exec: Exec: 0.99   Exec: Exec: 0.99  
                                             & Grind: N/A &                     
                                                Grind: 1.02                     
                                                                                

mfc: (venv) Exiting the Python virtual environment.

sbryngelson · 2025-06-12T02:02:22Z

We only really care about simulation grind time and exec. time and the pre/post process times aren't reliable anyway. we could probably even just get rid of them.

sbryngelson · 2025-06-13T14:11:38Z

can you make the PR source code slow so we can see it fail benchmarking?

Malmahrouqi3 · 2025-06-13T18:18:13Z

toolchain/mfc/bench.py

                ["--output-summary", summary_filepath] +
                case.args +
-                ["--", "--gbpp", ARG('mem')],
+                ["--", "--gbpp", 0.5],


I removed case optimization and capped gbpp to only 0.5.
This should slow all cases down drastically. I am not sure but It might exceed the allocated time. If happens to be the case, I will hop into Phoenix and figure out proper time for the bench job until I reach grind time failure mode.

Currently testing the failure mode on Delta and will post the outcomes here
I guess I will cap just the memory per process and keep case optimization to make it run slower but do not take forever

I overthought over it needlessly. I can just induce an artificial failure by modifying the grind times in the pr/master benchmark yaml files locally then run bench_diff.

it worked out luckily

Malmahrouqi3 · 2025-06-15T20:50:47Z

Malmahrouqi3 · 2025-06-15T20:57:39Z

Failed indeed as intended to be. To replicate the artificial failure, download benchmark yaml artifacts from any PR results then tweak the numbers to create unreasonable discrepancies. Finally, run ./mfc.sh bench_diff master/bench-cpu.yaml pr/bench-cpu.yaml
Now, I will undo the last commit changes.

sbryngelson · 2025-06-16T00:09:59Z

very good thanks!

sbryngelson · 2025-06-16T00:11:11Z

it looks like it failed the CPU benchmark because the exec. time was below 0.9 on preprocess? we should only run the test for the difference in speed on simulation

Malmahrouqi3 · 2025-06-16T00:15:46Z

@sbryngelson no, grind time is below threshold actually - 0.97. Ours is set for >=0.98. Exec time is meant to throw only warnings but cant terminate the job.

Malmahrouqi3 · 2025-06-16T00:16:23Z

toolchain/mfc/bench.py

+
+                    grind_time_value = lhs_summary[target.name]["grind"] / rhs_summary[target.name]["grind"]
+                    speedups[i] += f" & Grind: {grind_time_value:.2f}"
+                    if grind_time_value <0.98:


threshold for grind time

got it. well i think 0.98 is too strict against @wilfonba's suggestion. I would use 0.95 for now, can adjust later.

kinda stringent I believe so. I will change it now.

sbryngelson · 2025-06-16T00:29:03Z

cool it will probably pass and i will merge. thanks!

…wCode#876) Co-authored-by: mohdsaid497566 <[email protected]> Co-authored-by: mohdsaid497566 <[email protected]>

refactored diff and added tolerances for CI (MFlowCode#750)

5c4eb09

Malmahrouqi3 requested a review from a team as a code owner June 11, 2025 00:05

Lint Toolchain quick fixes

96aa00b

mohdsaid497566 added 3 commits June 11, 2025 09:52

quick fix to bench table restored to og

75af5e4

quick fix

1522ddf

lint fix

3221273

mohdsaid497566 added 2 commits June 11, 2025 12:37

bench table fix

bb22a53

syntax fix

6566179

lint and redundant words

6335fc0

induced failure

18c10f7

Malmahrouqi3 commented Jun 13, 2025

View reviewed changes

Malmahrouqi3 added 3 commits June 15, 2025 17:01

undo induced failure

15d1d75

lint fix

38c70dd

another lint fix

3e6c917

Malmahrouqi3 commented Jun 16, 2025

View reviewed changes

lower threshold for acceptable grind time

4e1eacc

sbryngelson approved these changes Jun 16, 2025

View reviewed changes

sbryngelson merged commit b605d41 into MFlowCode:master Jun 16, 2025
30 checks passed

Tolerances for CI Workflow - Grind & Exec Times (#750) #876

Tolerances for CI Workflow - Grind & Exec Times (#750) #876

Uh oh!

Conversation

Malmahrouqi3 commented Jun 11, 2025 • edited by sbryngelson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

codecov bot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wilfonba commented Jun 11, 2025

Uh oh!

sbryngelson commented Jun 11, 2025

Uh oh!

Malmahrouqi3 commented Jun 11, 2025

Uh oh!

sbryngelson commented Jun 12, 2025

Uh oh!

sbryngelson commented Jun 13, 2025

Uh oh!

Malmahrouqi3 Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Malmahrouqi3 Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

Malmahrouqi3 Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

Malmahrouqi3 Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

Malmahrouqi3 commented Jun 15, 2025

Uh oh!

Malmahrouqi3 commented Jun 15, 2025

Uh oh!

sbryngelson commented Jun 16, 2025

Uh oh!

sbryngelson commented Jun 16, 2025

Uh oh!

Malmahrouqi3 commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Malmahrouqi3 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

sbryngelson Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Malmahrouqi3 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

sbryngelson commented Jun 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Malmahrouqi3 commented Jun 11, 2025 •

edited by sbryngelson

Loading

codecov bot commented Jun 11, 2025 •

edited

Loading

Malmahrouqi3 commented Jun 16, 2025 •

edited

Loading