Skip to content

Conversation

@Malmahrouqi3
Copy link
Collaborator

@Malmahrouqi3 Malmahrouqi3 commented Jun 11, 2025

Description

Enhancement for CI workflow. Raises MFCException if grind time is less than an acceptable threshold. For Exec, it prints an error.

Resolves/closes #750

@Malmahrouqi3 Malmahrouqi3 requested a review from a team as a code owner June 11, 2025 00:05
@codecov
Copy link

codecov bot commented Jun 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.64%. Comparing base (db44da1) to head (4e1eacc).
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #876      +/-   ##
==========================================
+ Coverage   42.95%   45.64%   +2.69%     
==========================================
  Files          69       68       -1     
  Lines       19504    18646     -858     
  Branches     2366     2249     -117     
==========================================
+ Hits         8377     8511     +134     
+ Misses       9704     8775     -929     
+ Partials     1423     1360      -63     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wilfonba
Copy link
Collaborator

I picked the 0.98 tolerance for this. I looked through some old CI runs, and the grind times reported were basically all 0.99 or 1.00, so something worse than 0.98 seemed like it was worth looking at. @sbryngelson can comment on whether he thinks this is a reasonable tolerance or not.

@sbryngelson
Copy link
Member

@mohdsaid497566 @wilfonba
Right now, the thing that I hate the most is that the benchmark CI fails about 30% of the time due to some slurm/filesystem error. I have to re-run benchmarking a few times in some cases to get it to finish without error. This used to have an 80% failure rate, but I made it somewhat more robust. Nothing you guys can fix, but one day I'll get with PACE to fix it up. Until then, having this isn't super useful since benchmarking fails spuriously already (so those would become false positives basically).

I really like the idea of this, but I would much rather have continuous benchmarking, which is more robust/repeatable, and insightful (fixing this PR is just a bit of a convenience for me).

@Malmahrouqi3
Copy link
Collaborator Author

Sample benchmark output. It needs a minor fix.

 Comparing Benchmarks: Speedups from ../master/bench-cpu.yaml to bench-cpu.yaml are displayed below. Thus, numbers > 1 represent increases in performance.
 Warning: Exec time speedup for pre_process is less than 0.9 - Case: ibm
                                                                                
  Case                      Pre Process          Simulation       Post Process  
 ────────────────────────────────────────────────────────────────────────────── 
  5eq_rk3_weno3_hllc   Exec: Exec: 1.08    Exec: Exec: 1.00   Exec: Exec: 1.15  
                                             & Grind: N/A &                     
                                                Grind: 1.00                     
  ibm                  Exec: Exec: 0.84    Exec: Exec: 1.08   Exec: Exec: 1.08  
                                             & Grind: N/A &                     
                                                Grind: 1.09                     
  viscous_weno5_sgb…   Exec: Exec: 1.04    Exec: Exec: 1.00   Exec: Exec: 1.01  
                                             & Grind: N/A &                     
                                                Grind: 1.01                     
  hypo_hll             Exec: Exec: 1.02    Exec: Exec: 0.99   Exec: Exec: 0.99  
                                             & Grind: N/A &                     
                                                Grind: 1.02                     
                                                                                

mfc: (venv) Exiting the Python virtual environment.

@sbryngelson
Copy link
Member

We only really care about simulation grind time and exec. time and the pre/post process times aren't reliable anyway. we could probably even just get rid of them.

@sbryngelson
Copy link
Member

can you make the PR source code slow so we can see it fail benchmarking?

["--output-summary", summary_filepath] +
case.args +
["--", "--gbpp", ARG('mem')],
["--", "--gbpp", 0.5],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed case optimization and capped gbpp to only 0.5.
This should slow all cases down drastically. I am not sure but It might exceed the allocated time. If happens to be the case, I will hop into Phoenix and figure out proper time for the bench job until I reach grind time failure mode.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently testing the failure mode on Delta and will post the outcomes here
I guess I will cap just the memory per process and keep case optimization to make it run slower but do not take forever

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overthought over it needlessly. I can just induce an artificial failure by modifying the grind times in the pr/master benchmark yaml files locally then run bench_diff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it worked out luckily

@Malmahrouqi3
Copy link
Collaborator Author

Screenshot 2025-06-15 165001

@Malmahrouqi3
Copy link
Collaborator Author

Failed indeed as intended to be. To replicate the artificial failure, download benchmark yaml artifacts from any PR results then tweak the numbers to create unreasonable discrepancies. Finally, run ./mfc.sh bench_diff master/bench-cpu.yaml pr/bench-cpu.yaml
Now, I will undo the last commit changes.

@sbryngelson
Copy link
Member

very good thanks!

@sbryngelson
Copy link
Member

it looks like it failed the CPU benchmark because the exec. time was below 0.9 on preprocess? we should only run the test for the difference in speed on simulation

@Malmahrouqi3
Copy link
Collaborator Author

Malmahrouqi3 commented Jun 16, 2025

@sbryngelson no, grind time is below threshold actually - 0.97. Ours is set for >=0.98. Exec time is meant to throw only warnings but cant terminate the job.


grind_time_value = lhs_summary[target.name]["grind"] / rhs_summary[target.name]["grind"]
speedups[i] += f" & Grind: {grind_time_value:.2f}"
if grind_time_value <0.98:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threshold for grind time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. well i think 0.98 is too strict against @wilfonba's suggestion. I would use 0.95 for now, can adjust later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda stringent I believe so. I will change it now.

@sbryngelson
Copy link
Member

cool it will probably pass and i will merge. thanks!

@sbryngelson sbryngelson merged commit b605d41 into MFlowCode:master Jun 16, 2025
30 checks passed
prathi-wind pushed a commit to prathi-wind/MFC-prathi that referenced this pull request Jul 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Benchmark CI failure/improvement

3 participants