Submission checker modularization by pgmpablo157321 · Pull Request #2397 · mlcommons/inference

pgmpablo157321 · 2025-11-24T17:54:55Z

#1670
Testing command (outside the inference repo):

python -m inference.tools.submission.submission_checker.main --input inference_results_v5.1

implemented checks

performance:

Check performance detailed log exists
Check for loadgen errors
Check for equal issue mode when it is required
Check the performance sample count used for running the benchmark
Check loadgen seeds are correct
Check latency constrain is met
Check minimun query count is met
Check minimun duration is met
Check network requirements
Check LLM latencies are met (if applies)
Check loadgen scenario matches with submission scenario or that result can be inferred

accuracy

Check the accuracy metric is correct and over the expected threshold (or within a range if applies)
Check accuracy json exists and is truncated
Check for loadgen error
Check full dataset is used for the accuracy run

compliance

Check compliance directory exists
Run performance checks for compliance run
Check accuracy test passes
Check performance test passes

measurements

Check measurements files exist
Check the required files are there
Check the required fields are there

power

Check the required power files are there (if the submission has power)
Run the external power checks
Check power metric can be calculated

system

Check system json exists
Check availability is valid
Check system type is valid
Check network fields
Check required fields are include in system json file
Check submitter is correct
Check division is correct

github-actions · 2025-11-24T17:55:05Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

nv-alicheng · 2025-12-02T17:45:23Z

tools/submission/submission_checker/checks/base.py

@@ -0,0 +1,43 @@
+from abc import ABC, abstractmethod
+
+class BaseCheck(ABC):


We can utilize the __init_subclass__ feature of Python to handle this registry-like functionality.

https://peps.python.org/pep-0487/#subclass-registration

I believe a better paradigm would be something like:

Implement a Checker subclass that inherits BaseChecker

Each Checker class will implement some number of methods that are prefixed with check_

BaseChecker's __call__ and execute() methods will check all attributes on the class and run (in sequence) all attributes that are callable and start with the string check_

If there is a dependency, we can implement an @BaseCheck.mark_dependency(<str>, ...) decorator where you can pass in a list of strings that are the function names, which need to be executed before the current check.

From what I could tell, all the implemented Check classes have an init method with the arguments log, path, config, submission_logs - The BaseCheck class should probably do the same and just store the values in self to be used by the subclasses later.

nv-alicheng · 2025-12-16T21:03:18Z

tools/submission/submission_checker/checks/base.py

+            v = self.execute(check)
+            valid &= v
+            if not valid:
+                return False


I'm wondering if it makes more sense to run every check here and return the success value of each, keyed by the checker's class? Something like:

{ AccuracyCheck: True, ComplianceCheck: False, PerformanceCheck: True, ... }

I can see this being clunky if many tests depend on each other, which means that check failures will cascade. In which case my question is should there be a system to determine the dependencies of each test? Something like MeasurementsCheck depends on DirectoryStructureCheck, etc.

nv-alicheng · 2025-12-16T21:04:35Z

tools/submission/submission_checker/checks/compliance_check.py

+        if model in self.config.base["models_TEST04"]:
+            test_list.append("TEST04")
+        if model in self.config.base["models_TEST06"]:
+            test_list.append("TEST06")


Does it make more sense to have a ComplianceCheck class (which inherits BaseCheck), then have individual TEST0XCheck subclasses?

nv-alicheng · 2025-12-16T21:27:28Z

tools/submission/submission_checker/main.py

+
+    if args.scenarios_to_skip:
+        scenarios_to_skip = [
+            scenario for scenario in args.scenarios_to_skip.split(",")]


Can you add a formatter to the project like autopep8 or black?

nv-alicheng · 2025-12-16T21:54:12Z

tools/submission/submission_checker/main.py

+    for logs in loader.load():
+        # Initialize check classes
+        performance_checks = PerformanceCheck(
+            log, logs.loader_data["perf_path"], config, logs)


The overloading of the term log here feels clunky and confusing. A few comments here:

It seems like bad practices to create a logger named 'main' and pass it around to each Checker. It would be better that each file has it's own logger (logging.getLogger(__file__)) so that if (for instance) ExampleCheck did log.info("Missing file ___"), the message in console would show that it originated in ExampleCheck rather than main.

If each file has its own logger, you no longer need to pass around log everywhere (makes it more concise)

If we are passing in logs, is there a point to also pass in logs.loader_data[key]? Can't that just be extracted by the Check's init method?

If this is simplified down to just xxxx_check = XXXXCheck(config, logs), then it can further be simplified down to

for logs in loader.load(): for check_cls in [PerformanceCheck, ...]: check_cls(config, logs)()

nv-alicheng · 2025-12-16T21:56:38Z

tools/submission/submission_checker/main.py

+        measurements_checks()
+        power_checks()
+
+    with open(args.csv, "w") as csv:


Is this a TODO?

nv-alicheng · 2025-12-16T22:09:34Z

tools/submission/submission_checker/configuration/v5.0/config.yml

Why are these empty files here?

nv-alicheng · 2025-12-16T22:20:18Z

tools/submission/submission_checker/configuration/configuration.py

+
+    def load_config(self, version):
+        # TODO: Load values from 
+        self.models = self.base["models"]


I mentioned this in the GH Issue, but if the giant model dict is already being stored in a Python file, it should probably be refactored into some hierarchy of dataclasses. Having a class based representation would also make doing the key -> property remapping you're doing here either easier or unnecessary.

Having it as dataclasses rather than a dict also makes the schema of the config more defined and easier to navigate.

anandhu-eng · 2025-12-19T10:05:04Z

Hi @pgmpablo157321 , could you please update the Summary section of the README to reflect what the updated command looks like?

mkankana · 2025-12-19T18:38:20Z

initial test with 5.1 submission seem to be passing.

mrmhodak · 2026-01-06T18:00:47Z

@nv-alicheng @nvzhihanj to take a look and provide guidance whether to merge for 6.0

hanyunfan · 2026-01-06T22:49:59Z

@arjunsuresh Hi Arjun, I’m not sure if you’re the right person for this. Could you, or someone else from the automation group, help address the ResNet50 and RetinaNet errors in some of the checks? It looks like this is affecting multiple PRs, which aren't related with ResNet and RetinaNet at all.

nvzhihanj · 2026-01-08T21:08:13Z

@pgmpablo157321 Thanks for the PR! it seems like some of Alice's comments are not addressed yet. I would suggest that we either address them in this PR, or document the TODOs (in the issue) and address in follow-up PRs.
Some of the refactors should be quite easy with Cursor I believe

hanyunfan · 2026-01-13T17:16:08Z

Pablo confirm this is ready for merging.

hanyunfan · 2026-01-14T15:57:45Z

WG: It is ok to merge it.

hanyunfan

LGTM

pgmpablo157321 requested a review from a team as a code owner November 24, 2025 17:54

pgmpablo157321 changed the title ~~First sketch of submission checker~~ Submission checker modularization Nov 24, 2025

pgmpablo157321 force-pushed the submission_checker_refactor branch from ae987c9 to 9f7e9f4 Compare November 25, 2025 17:00

pgmpablo157321 force-pushed the submission_checker_refactor branch from f95f29a to cb7db52 Compare December 2, 2025 23:56

pgmpablo157321 force-pushed the submission_checker_refactor branch 2 times, most recently from 237bbe3 to a4fca64 Compare December 16, 2025 16:10

nv-alicheng suggested changes Dec 16, 2025

View reviewed changes

pgmpablo157321 force-pushed the submission_checker_refactor branch 2 times, most recently from a10cbc0 to 91a2e33 Compare December 19, 2025 05:52

pgmpablo157321 force-pushed the submission_checker_refactor branch 2 times, most recently from 5de7958 to 0a75d8f Compare December 22, 2025 23:17

pgmpablo157321 force-pushed the submission_checker_refactor branch from e9a6279 to 42acfd4 Compare January 6, 2026 21:22

pgmpablo157321 force-pushed the submission_checker_refactor branch 3 times, most recently from f8c26d3 to a3aec67 Compare January 8, 2026 00:10

pgmpablo157321 force-pushed the submission_checker_refactor branch 3 times, most recently from 72b20e1 to 9f6d2e2 Compare January 9, 2026 15:58

pgmpablo157321 added 6 commits January 12, 2026 14:43

First sketch of submission checker

f150d80

Add initial loader loop

7385554

Quick fixes for loader class

eca1798

Add performance checks new submission checker

62ba439

Add accuracy checks to submission checker

ede74d0

Add next batch of checks

75199a4

pgmpablo157321 and others added 8 commits January 12, 2026 14:45

Quick fix: init messages correctly

e301f6f

v5.1 and lower compatibility: skip dataset test

c8461ef

Format files

8e7e35f

Add src check to submission checker

2a48800

Format loader file

fd1cc40

Remove unused files in submission checker

1a0c74f

Autopep8 format new submission checker

d32cc59

[Automated Commit] Format Codebase

c461e67

pgmpablo157321 force-pushed the submission_checker_refactor branch 3 times, most recently from e278c94 to bd34399 Compare January 12, 2026 19:57

Fix Bug: do not skip measurements checks

bbcff8e

pgmpablo157321 force-pushed the submission_checker_refactor branch from bd34399 to bbcff8e Compare January 12, 2026 20:05

pgmpablo157321 and others added 4 commits January 12, 2026 15:13

Bug Fix: power checker bad formatting

3c9bc10

[Automated Commit] Format Codebase

7ee78b4

Bug Fix: move code dir to src dir

23d7578

Delete more unused code

cd4d47d

pgmpablo157321 force-pushed the submission_checker_refactor branch from 0a7562c to 2ec5922 Compare January 12, 2026 22:36

Add docstring to new submission checker

b23e33a

pgmpablo157321 force-pushed the submission_checker_refactor branch from da1ae6b to b23e33a Compare January 13, 2026 21:25

anandhu-eng and others added 2 commits January 14, 2026 09:29

Remove custom branch pointer

0efd2a6

Merge branch 'master' into submission_checker_refactor

2864163

pgmpablo157321 force-pushed the submission_checker_refactor branch from fcf6783 to 2864163 Compare January 14, 2026 04:18

anandhu-eng and others added 2 commits January 14, 2026 11:22

fix typo

825103a

Update v6.0 seeds

44bb9e9

hanyunfan approved these changes Jan 14, 2026

View reviewed changes

hanyunfan merged commit 7651400 into master Jan 14, 2026
20 of 28 checks passed

github-actions bot locked and limited conversation to collaborators Jan 14, 2026

		@@ -0,0 +1,43 @@
		from abc import ABC, abstractmethod

		class BaseCheck(ABC):

Conversation

pgmpablo157321 commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

implemented checks

Uh oh!

github-actions bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anandhu-eng commented Dec 19, 2025

Uh oh!

mkankana commented Dec 19, 2025

Uh oh!

mrmhodak commented Jan 6, 2026

Uh oh!

hanyunfan commented Jan 6, 2026

Uh oh!

nvzhihanj commented Jan 8, 2026

Uh oh!

hanyunfan commented Jan 13, 2026

Uh oh!

hanyunfan commented Jan 14, 2026

Uh oh!

hanyunfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pgmpablo157321 commented Nov 24, 2025 •

edited

Loading

github-actions bot commented Nov 24, 2025 •

edited

Loading