automate generating cuda/cpu metrics from DPP by JRosenkranz · Pull Request #156 · foundation-model-stack/aiu-fms-testing-utils

JRosenkranz · 2025-10-23T14:10:23Z

This PR adds a method to generate metrics from DPP which will fully match the testing workflow. It also includes parameters to take in the generated metrics file and produce the proper assertions per test run.

To run this with generate_metrics requires the following PR foundation-model-stack/foundation-model-stack#481

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

JRosenkranz · 2025-10-23T14:14:41Z

bot:test
TEST_FILE=test_scripts.py

JRosenkranz · 2025-10-23T15:20:04Z

bot:test

Abhishek-TAMU · 2025-10-23T19:40:09Z

bot:test
TEST_FILE=test_scripts.py

Abhishek-TAMU · 2025-10-23T19:40:13Z

bot:test

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

JRosenkranz · 2025-10-24T17:52:04Z

bot:test

JRosenkranz · 2025-10-24T17:55:27Z

bot:test
TEST_FILE=test_scripts.py

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti · 2026-01-07T23:35:13Z

aiu_fms_testing_utils/scripts/drive_paged_programs.py


 parser.add_argument(
-    "--cross_entropy_threshold",
+    "--default_cross_entropy_threshold",


lets not change the argument name, as this is a breaking change and too many teams have adopted DPP. We dont have documentation yet . I am reverting it back as its not a necessary change

Ssukriti · 2026-01-09T23:18:30Z

aiu_fms_testing_utils/scripts/drive_paged_programs.py

 distributed_kwargs = {}
 if USE_DISTRIBUTED:
+    if generate_metrics:
+        torch.cuda.set_device(local_rank)


I ran into a blocker testing this PR as the machines in our cluster do not have GPUs. I think to generate the metrics file for every new model, it will be a challenge from CI/CD perspective as we do not have access to GPU machines. So need to think about this a bit

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

JRosenkranz added 3 commits October 17, 2025 13:56

updated with a method to generate metrics in dpp

eb2af75

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

added a check against a threshold file

986cb0e

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

added a per-sequence failure rate

bce63ae

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

JRosenkranz requested a review from tharapalanivel October 23, 2025 14:13

JRosenkranz requested a review from ani300 October 23, 2025 14:23

JRosenkranz marked this pull request as ready for review October 23, 2025 14:44

fixed merge conflicts

b453a56

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

kcirred mentioned this pull request Oct 27, 2025

[dpp] add option to only save cpu results without running aiu #154

Closed

JRosenkranz and others added 2 commits November 5, 2025 01:54

fixed validation info loading bug when batch size > 9

85f7b53

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

merge main

70e82a1

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti reviewed Jan 7, 2026

View reviewed changes

Ssukriti reviewed Jan 9, 2026

View reviewed changes

set default arg

8303606

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automate generating cuda/cpu metrics from DPP#156

automate generating cuda/cpu metrics from DPP#156
JRosenkranz wants to merge 7 commits intomainfrom
generate_metrics_dpp

JRosenkranz commented Oct 23, 2025 •

edited

Loading

Uh oh!

JRosenkranz commented Oct 23, 2025

Uh oh!

JRosenkranz commented Oct 23, 2025

Uh oh!

Abhishek-TAMU commented Oct 23, 2025

Uh oh!

Abhishek-TAMU commented Oct 23, 2025

Uh oh!

JRosenkranz commented Oct 24, 2025

Uh oh!

JRosenkranz commented Oct 24, 2025

Uh oh!

Ssukriti Jan 7, 2026

Uh oh!

Ssukriti Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JRosenkranz commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JRosenkranz commented Oct 23, 2025

Uh oh!

JRosenkranz commented Oct 23, 2025

Uh oh!

Abhishek-TAMU commented Oct 23, 2025

Uh oh!

Abhishek-TAMU commented Oct 23, 2025

Uh oh!

JRosenkranz commented Oct 24, 2025

Uh oh!

JRosenkranz commented Oct 24, 2025

Uh oh!

Ssukriti Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Ssukriti Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JRosenkranz commented Oct 23, 2025 •

edited

Loading