Skip to content

automate generating cuda/cpu metrics from DPP#156

Open
JRosenkranz wants to merge 7 commits intomainfrom
generate_metrics_dpp
Open

automate generating cuda/cpu metrics from DPP#156
JRosenkranz wants to merge 7 commits intomainfrom
generate_metrics_dpp

Conversation

@JRosenkranz
Copy link
Contributor

@JRosenkranz JRosenkranz commented Oct 23, 2025

This PR adds a method to generate metrics from DPP which will fully match the testing workflow. It also includes parameters to take in the generated metrics file and produce the proper assertions per test run.

To run this with generate_metrics requires the following PR foundation-model-stack/foundation-model-stack#481

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@JRosenkranz
Copy link
Contributor Author

bot:test
TEST_FILE=test_scripts.py

@JRosenkranz JRosenkranz requested a review from ani300 October 23, 2025 14:23
@JRosenkranz JRosenkranz marked this pull request as ready for review October 23, 2025 14:44
@JRosenkranz
Copy link
Contributor Author

bot:test

@Abhishek-TAMU
Copy link
Collaborator

bot:test
TEST_FILE=test_scripts.py

@Abhishek-TAMU
Copy link
Collaborator

bot:test

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@JRosenkranz
Copy link
Contributor Author

bot:test

@JRosenkranz
Copy link
Contributor Author

bot:test
TEST_FILE=test_scripts.py

JRosenkranz and others added 2 commits November 5, 2025 01:54
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

parser.add_argument(
"--cross_entropy_threshold",
"--default_cross_entropy_threshold",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not change the argument name, as this is a breaking change and too many teams have adopted DPP. We dont have documentation yet . I am reverting it back as its not a necessary change

distributed_kwargs = {}
if USE_DISTRIBUTED:
if generate_metrics:
torch.cuda.set_device(local_rank)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into a blocker testing this PR as the machines in our cluster do not have GPUs. I think to generate the metrics file for every new model, it will be a challenge from CI/CD perspective as we do not have access to GPU machines. So need to think about this a bit

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants